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PESCatlPTION 



POLYMORPHISMS IN THE EPXDEBMAL GROWTH FACTOR RECEPTOR GENE 
PROMOTER 



BACKGROmP OF THE INVENTION 



The present invention claims priorily to U.S. Provisional Patent Application Serial No. 
60/549,069, filed on March 1, 2004, which is hereby incorporated by reference. The government 
10 owns rights in the present invention pursuant to grant number U01GM61393 from the National 
Institutes of Health. 

1. Field of the Inventioii 

The present invention relates generally to the field of molecular biology and oncology. 
15 More particularly, it concerns polymorphisms in the epidermal growth fector receptor (EGFR) 
gene associated with EGFR expression and activity. In some ; embodiments, the present 
invention is directed at compositions and methods involving single nucleotide polymorphisms 
(SNPs) in the promoter of the EGER gene that affect EGFR e: 



2. Description of Related Art 

20 Human epidermal growth factor receptor (EGFR) plays a critical role in the signal 

transduction pathway of cell proliferation, differentiation and survival. Overexpression of EGFR 
is found in about 30% of human primary tumors. Its activation ui these tumors appears to 
promote tumor growth by increasing cell proliferation, motility, adhesion, invasive capacity, and 
by blocking apoptosis (Tysnes et al, 1997). EGFR overexpression and dysregulation has been 

25 associated witii poorer prognosis in patients, and with metastasis, late-stage disease, and 
resistance to chemotherapy, hormonal therapy, and radiotherapy (Salomon et al., 1995; Akunoto 
et al., 1999; Wosikowski et al. , 2000). 

The EGFR 5' regulatory region spans about 4 kb covering 2kb upstream and 2 kb 
downstream of exon 1. The regulatory elements include a promoter region and two separate 
30 enhancer regions. The fiinction of the EGFR promoter and enhancers are well studied and 
documented (Ishii et al., 1985; Haley et al., 1987; Johnson et al., 1988; Kageyama et al., 1988; 
Maekawa et al., 1989). Briefly, there is no TATA or CAAT box found in the promoter. Instead, 
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there are multiple transcription initiation sites (Ishii et at, 1985; Haley et al, 1987; Johnson et 
al, 1988; Kageyama et at, 1988), A number of cis- and trans- regulators have been discovered. 
These regulators include EGF responsive DNA-binding protein (BRDBP-1), p53, p63, Spl, 
Vitamin D-responsive element (VDRE) and estrogen responsive element, which reflects the 
5 perplexing regulation of EGFR, 

Deoxyribonuclease I footprinting showed that Spl can bind to four CCGCCC sequences 
(-457 to -440, -365 to -286, -214 to -200, and -1 10 to -84) in the EGFR gene promoter and may, 
therefore, play a vital role in the gene regulation (Johnson et al., 1998). Studies by Gebhardt and 
colleagues (1999) demonstrated that a dinucleotide (CA)n repeat polymorphism in the intron I of 
10 EGFR (near the downstream enhancer) ranguig from 14 to 21 repeats, appears to regulate EGFR 
expression. The longer allele with 21 repeats showed an 80% reduction of gene expression 
compared to the shorter allele with 16 repeats (Gebhardt et al, 1999; Buerger et al., 2000). Data 
fix>m studies on the polymorphic CA repeat suggest that this polymorphic site may play a role in 
cancer susceptibility (Brandt ef fl/., 2004). 

15 Giveai the. inrportance of EGFR in tumor biology, several EGFR-targeted cancer therapies 

are currently under development. EGFR-targeting agents are typically dnected to inhibiting 
EGFR phosphorylation or blockmg EGF binding. One drug that was recently approved for the 
treatment of metastatic non-small cell lung cancer is gefitinib. Gefitinib is a selective EGFR- 
tyrosine kinase inhibitor that inhibite EGF-stimulated EGFR autophosphorylation. 

20 Because EGFR is the direct target of a number of anticancer drugs, variable expression of 

EGFR may directly ajSfect drug response and toxicity. Therefore, polymorphisms in the EGFR 
gene relevant to gene expression or activity will be important both to further understanding the 
cell signal transduction and to elucidating drug response/toxicity. Studies of the polymorphisms 
in the EGFR gene may also be useful for future drug design. 

25 . EGFR expression is also associated with diseases other than cancer. For example, an 

association was reported between an EGFR microsatellite polymorphism and the rate of 
progre^ion of autosomal dominant polycystic kidney disease (ADPKD) (Magistroni et al, 
2003). It has been suggested that mutations that influence the function or expression of EGFR 
might predispose to inflammatory bowel disease (Martin et al, 2002). Thus, the identification of 

30 polymorphisms in the EGFR gene relevant to its expression or activity will be important to 
further understand the progression of a variety of diseases associated with EGFR dysregulation. 
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SUMMARY OF THE INVENTION 

The present invention discloses twelve polymorphisms in the EGFR 5' regulatory region. 
More particularly, the inventors demonstrated that the -216G>T polymorphism is associated 
with hicreased expression fixjm the EGFR promoter region. The identification of polymorphisms 
5 associated with EGFR expression enables novel methods and compositions for evaluating the 
potential efficacy and/or toxicity of an EGFR-targeting therapeutic agent, predicting a patient's 
clinical prognosis, and evaluating a patient's risk of developing a disease that is associated with 
EGFR dysregulation. 

The present invention discloses polymorphic sites in the EGFR gene locus at nucleotide 
10 positions -1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, and 2034. The 
nucleotide positions -1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, and 
2034 of the EGFR gene locus are identified by their position in relation to the translation start 
site, which is designated +1. There is no nucleotide position designated 0. According to this 
nomenclature the nucleotide immediately 5' of +1 is -1, and the nucleotide immediately 3' of +1 
15 is 2. The translation start site (+1) corresponds to nucleotide 9,385 of .the EGFR gene locus 
(GeuBank accession number AF288738, incorporated herein by reference) and nucleotide 505 of 
SEQ ID NO: 1 . SEQ ID NO: 1 includes nucleotides 8,881 to 9,405 of AF288738. 

The specific poljonorphism discovered by the inventors are -1435 OT, -1300 G>A, - 
1249 G>A, -1227 G>A, -761 OA, -650 G>A, -544 G>A, -486 OA, -216 G>T, -191 C>A, 169 
20 G>T, and 2034 G>A. As these polymorphisms are located in the 5' regulatory region of the , 
EGFR gene, tliey may be associated with gene regulation. 

Thus, in one embodiment, the present invention provides a method for predicting the 
expression level of EGFR in a cell or cells comprising detemiining the sequence at one or more 
of nucleotide positions -1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, or 

25 2034 on one or both EGFR genes in the cell. Consequently, a patient having such cells could be 
predicted to have that general level of EGFR expression. In a preferred embodiment the method 
comprises determining the sequence at position -216 in one or both alleles of the EGFR gene in 
the cell. The presence of a T at position -216 in one or both alleles is indicative of a higher 
expression level. A "higher expression level" is a level of expression that is greater than the 

30 expression level in a cell with a G at position -216 on both alleles of the EGFR gene. The term 
"determining" is used according to its plain and ordinary meaning; it means to find out or come 
to a decision about by investigation, reiasoning, or calculation. 
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Polymorphisms in linkage diseqmlibrium with a polymorphism at nucleotide positions - 
1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, or 2034 of the EGFR gene 
locus may also be used with the methods of the present invention. "Linkage disequilibrium" 
("LD" as used herein, though also referred to as "LED" in the art) is used according to its plain 
5 md ordinary meaning to one skilled in the art. LD refers to a situation where a particular 
combination of alleles (i. e. , a variant form of a given gene) or poljTnorphisms at two loci appears 
more firequently than would be expected. "Significant" as used in respect to linkage 
disequilibrium, as determined by one of skill in the art, is contemplated to be a statistical p or a 
value that may be 0.25 or 0.1 and may be 0,1, 0.05. 0.001, 0.00001 or less. The relationship 
10 between EGFR haplotypes and the expression level of the EGFR protein may be used to 
correlate the genotype (i.e., the genetic make up of an organism) to a phenotype (i.e., the 
physical traits displayed by an organism or cell). "Hapldtype" is used according to its plain and 
ordinary meaning to one skilled in the art It refers to the genotype of two or more alleles or 
polymorphisms along one of the homologous chromosomes. 

15 The sequences at, or in linkage disequilibrium witii, nucleotide positions -1435, -1300, - 

1249, -1227, -761, -650, -544, -486, -216, -191, 1 69, and 2034 flf the EGFR gene locus may be 
determined by any method know to those skilled in the art. The sequence may be deteraiined 
directly or indirectly. The sequence of a nucleotide position of interest may be determined 
indirectly by, for example, determining the nucleotide sequence at a position known to be in 

20 linkage disequilibrium with a specific nucleic acid at the position of interest. Methods for 
determining tiie sequence at a specific nucleotide position include, for example, hybridi2ation 
assays, allele specific amplification assays, sequencing assays, a microsequencing assays, 
invasive cleavage assays, and restriction enzyme assays. In a specific embodiment, the presence 
of a -216 G>T polymorphism is determined by digestion with restriction enzyme BseRl. An 

25 allele with a T at position -216 can be cut with BseRl , \diereas an allele with a G at position - 
216 cannot be cut. 

In other ranbodiments, iiie invention provides methods for evaluating the potential 
efficacy of an EGFR-targeting therapeutic agent for the treatment of a disease associated with 
the dysregulation of EGFR in a patient comprising determining the sequence at nucleotide 
30 position -1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, or 2034 in one or 
both EGFR genes in the patient. 

A disease associated with the dysregulation of EGFR may be any disease in which EGFR 
is overexpressed, underexpressed, or expressed at inappropriate times compared to the 
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expression in comparable normal cells. Examples of diseases associated with the improper 
regulation of EGFR expression include cancer, autosomal dominant polycystic kidney disease, 
and inflammatory disorders such as inflammatory bowel disease. 

An EGFR-targeting therapeutic agent may be any agent capable of modulating EGFR 
5 activity either directly or indirectly. EGFR-targeting ther^eutic agents known in the art are 
typically directed to inhibiting EC3FR phosphorylation or blocking EGF binding. Two EGFR- 
tai^eting tiierapeutic agents have received' FDA approval, Iressa (gefitinib) and Brbitux 
(cetuximab). Another EGFR-targeting therapeutic agent, Tarceva (erlotinib), is in phase HI 
trials. Iressa and Tarceva are small molecules, whereas Erbitux is a monoclonal antibody. Other 
10 EGFR-targeting agents modulate EGFR activity by regulating its transcription. For example, 
EGFR mRNA production can be stimulated directly or indirectly by treating cells with EGF, 
dexamethasone, thyroid hoimone, letinoic acids, interferon a, or wild-type p53. 

In certain aspects, the present invention provides miefhods for evaluating the potential 
eflBcacy of an EGFR-targeting therapeutic agent for the treataient of cancer in a patient 

15 comprismg determining the sequence at nucleotide position -1435, -1300, -1249, -1227; -761, - 
650, -544, -486, -216, -191, 169, or 2034 in one or both EGFR genes in the patient. In some 
embodiments the EGFR-targeting therapeutic agent is gefitinib, erlotinib, or cetuximab. In a 
preferred the sequence at position -216 is determined. In some embodiments, a patient having a 
T at position -216 on one or both alleles of the EGFR gene is an indicator of decreased efficacy 

20 of the EGFR-targeting therapeutic agent as compared to a patient witii a G at position -216 on 
both alleles. 

In some embodiments, the methods of the present invention ftcrther comprise obtaining a 
sample. A sample may be any sample containing genomic DNA from, which the sequence at 
nucleotide position -1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, or 2034 
25 m one or both EGFR genes can be determined. The sample may be obtained by, for example, 
biopsy, venipuncture, aspiration, or swabbing. The sample may be from any tissue or body fluid. 
In certain embodiments, the sample comprises buccal cells, mononuclear cells, or cancer cells. 

In certain aspects, the methods of the present invention further comprise administering 
the EGFR-targeting therapeutic agent to the patient. 

30 In other embodiments, the present invention provides methods for predicting the clinical 

prognosis for a patient having a disease associated with the dj/sregulation.of EGFR comprising 
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detennining the sequence at, or in linkage disequilibrium with, one or more of nucleotide 
positions -1435, -1300, -1249, -1227, -761, -650,.-544, -486, -216, -191, 169, or 2034 in one or 
both EGFR genes in the patient, In some embodiments the polymorphism is -216 G>T. The 
presence of a T at position -216 on an allele is an indicator of an increased expression of EGFR 
protein. In certain aspects, the increased expression of EGFR protein is predictive of poor 
prognosis. In some embodiments, tlie disease associated with the dysregulation of EGFR is 
cancer. For a patient with cancer, poor prognosis may indicate, for example, increased resistance 
to chemotherapy, hormonal therapy, or radiotherapy. Poor prognosis may also indicate an 
increased risk of metastasis or decreased survival time. 

In one embodiment, the present invention provides methods for evaluating a patient's risk 
of toxicity to an EGFR-targeting therapeutic agent comprisitig determining the presence of a 
polymorphism at one or more of nucleotide positions -1435, -1300, -1249, -1227, -761, -650, - 
544, -486, -216, -191, 169, or 2034 in one or both EGFR genes in the patient. In one ^pect, the 
polymorphism is -216 G>T. In one embodiment, the presence of a T at position -216 on one or 
both alleles is an indicator of decreased toxicity of the EGFR-targeting therapeutic agent. 

In other embodiments, the present rnv€;ntion provides methods for evaluating a patient's 
risk of developing cancer comprising determining the presence: of a polymorphisrn at one or 
more of nucleotide posittons -1435, -1300, -1249, -1227, -761, -650,-544, -486, -216, -191, 169, 
or 2034 in one or both EGFR gpnes in the patient Li one embodiment the polymorphism-is -216 
G>T. 

In certain aspects of the present invention, the methods fltrther comprising taking a 
patient history, wherein the patient is identified as being at risk for developing cancer or in need 
of an EGFR-targeting therapeutic agent. 

The present invention also provides kits. In one embodimesnt, the present invention 
provides kits for the detection of a polymorphism at one or more of nucleotide positions -1435, - 
1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, or 2034. In some embodiments, the 
kit contains a nucleic add for detennining the presence of the polymorphism. The nucleic add 
may be a primer or a probe. In some embodiments, the probe is comprised in an oligonucleotide 
array or microarray. In other embodiments, the kit contains a restriction enzyme for determining 
the presence of the polymoq)hism. In certain embodiments, the kit contains both a nucleic acid 
and a restriction enzyme: A control nucleic acid may be iiroluded in the kit. 
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La some embodiments, the nucleic acids of tlie kit comprise 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500 or more 
consecutive nucleotides of SEQ H) NO: 2. 

Jn certain aspects, the present invention provides kits for evaluating the potential efficacy 
5 of an EGFR-targeting therapeutic agent ih a patient comprising a nucleic acid for determining 
the presence of a polymorphism at nucleotide position -1435, -1300, -1249, -1227, -761, -650, - 
544^ -486, -216, -191, 169, or 2034 in the EGFR gene locus. In other aspects, the present 
invention provides kits for evaluating the potential efficacy of an EGFR-targeting ther^eutic 
agent in a patient comprising a restriction enzyme for detomining the presence of a 
10 polymorphism at nucleotide position -1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, - 
191, 169, or 2034 in the EGFR gene locus. 

It is contemplated tliat any method or composition described herein can be implemented 
with respect to any other mefliod or composition described herein. 

The use of the term "or" m the claims is used to mean " and/or" unless explicitly 
15 indicated to refer to alternatives only or the alternatives are mutually exclusive, although the 
disclosure supports a definition that refers to only alternatives and "and/or." 

Throughout this application, the term "about" is used to indicate that a value includes the 
standard deviation of eiror for tiie device or method being employed to determine the value. 

Following long-standing patent law, the words "a" and "an," when, used in conjunction 
20 with the word "comprising" in the claims or specification, denotes one or more, unless 
specifically noted. 

Other objects, features and advantages of the present invention will become parent 
from the following detailed description. It shotild be understood, however, that the detailed 
description and the specific examples, while indicating specific embodiments of the invention, 
25 are given by way of illustration only, since various changes and modifications within the spirit 
and scope of the invention will become apparent to those skilled in the art ftom this detailed 
description. 

BRIEF DESCBIPTION OF THE DRAWINGS 

The following drawings form part of the present specification and are included to further . 
30 demonstrate certain aspects of the present invention. The invention may be better understood by 
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wo 2005/085473 



PCT/US2005/006559 



reference to one or more of ftese drawings in combination wilh the detailed description of 
specific embodiments presented herein. 

MG. 1. FIG. 1 is a xasp of the EGFR locus. The EGFR regulatory region is 
expwded to show the promoter, enhancers, and exon 1. The location of the 12 single nucleotide 
5 polymorphisms discovered in the regulatory region are indicated as arrows. 

BIG. 2. FIG. 2 shows the nucleotide sequence of the EGFR promoter region. The 
nucleotide sequence is fiom -504 to +21 where +1 designates the fkst nucleotide of the 
translation start codon and there is no nucleotide designated 0. The positions of the -216 G>T 
polymorphism, -191 OA polymorphism, Spl binding site, transcription initiation site, SacI 
1 0 cutting site, and the position of the forward primer are also indicated. 

BIG. 3. FIG. 3 shows the vector map constructed for the luciferase activity assays. 
The 405 bp KpnI-SacI ftagment of the EGFR promoter was cloned into the polyclonal site 
upstream of the luciferase gene. The positions of primers, RVP3 and 01^*2, vMch were-used to 
sequence the cloned fragments, are also indicated. 

15 FIG. 4. FIG. 4 shows the expression activity of the four haplotypes for the EGFR 

polymorphisms -216 G>T and -191 OA in transient transfection assays with the luciferase^ 
reporter construct. Relative expression of the luciferase gene was normalized by the renilla gene 
level ui the pRL-TK vector, 

FIG. 5. FIG. 5 shows an electromobility shift assay testing the binding efficiency 
20 of nuclear proteins to the -216G and -216T alleles. The Spl consensus probe was used as a 
control. The probe and competitor sequences used in the EMSA are Usted in Table 4. 
Significantly higher binding efficiency of nuclear protein was observed with the -216T allele 
(lane 3) compared to the -216G allele (lane 1). 

JIG. 6A-B. Transient transfection of pGL3£GFRluc (*1 to *4) m MDA-MB-231, 

25 MCF-7, HEK-293 and SL-2 cells (A). For human cell lines, 1.6 ^ig pGLSEGFRlvic was co- 
transfected with 160 ng pRL-TK vector. For SL-2 cells, 300 ng pGL3£GFiiluc was co- 
transfected with 100 ng pPac-Spl vector and relative expression of 200 light units of luciferase 
activity/ixg total protein/ml was set to 1 . Significant difference of promoter activity was observed 
between G-C and T-C haplotype of-216G/T-191C/A (all p values are less than 0.04). Data were 

30 shown as meaniSEM. Relative expression of EGFR among MDA-MB-231, MCF-7 and 
HEK293 cell lines and corresponding genotypes of-216G/T and -191C/A polymorphisms were 

■ . . -8- ■ ' 
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shown in (B). EGFR niRNA level was nonnalized to 1000 copies of P-actin ge: 
were repeated three times and data were shown as meaniSEM. 

PESCRIPTTON OF ILLUSTRATIVE EMBODIMENTS 

5 A. EPIDEIiMAL GROWTH FACTOR RECEPTOR 

Human epidermal growth factor receptor (EGFR) is a transmembrane protein. Binding 
of ligands, such as epidermal growth factor and TGF-a, with its N-terminus on the extracellular 
surface induces receptor dimerization and activates the tyrosine kinase activity of the 
intracellular domain. Activation of EGFR leads to a cascade of cellular events that ultimately 
1 0 result in DNA synthesis, and cell proliferation, maturation, survival, and apoptosis. 

The expression of EGFR is mainly regulated at the transcription level (Xu et ah, 1984). 
It has been demonstrated that EGFR mRNA production can be stimulated directly or indirectly 
by treating cells with EGF, dexamethasone, thyroid hormone, retinoic acids, interferon a, or 
wild-type p53 (Deb et al, 1994; Grandis et al, 1996; Hudson et al, 1989; Subler et al, 1994; 
15 Xnetal., 1993. ■ 

The EGFR 5' regulatory region spans about 4 kb covering 2kb upstream and 2 kb 
downstream of exon 1. The regulatory elements include a promoter region and two separate 
enhancer regions. The function of the EGFR promoter and enhancers are well studied and 
documeaited (Ishii et al, 1985; Haley et al., 1987; Johnson et al, 1988; Kageyama et al, 1988; 

20 Maekawa et al , 1989; each of which in incorporated by reference). Briefly, there is no TATA or 
CAAT box found in the promoter. Instead, there are multiple transcription initiation sites (Ishii 
et al, 1985; Haley et al., 1987; Johnson et al, 1988; Kageyama et al, 1988). A number of cis- 
and trans- regulators have been discovered. These regulators include EGF responsive DNA- 
binding protein (ERDBP-1), p53, p63, Spl, Vitamin D-responsive element (VDRE) and estrogen 

25 responsive element, which reflects the perplexing regulation of EGFR. 

Deoxyribonuclease I footprintmg showed that Spl can bind to four CCGCCC sequences 
(-457 to -440, -365 to -286, -214 to -200, and -1 10 to -84) in the EGFR gene promoter and may, 
therefore, play a vital role in the gene regulation (Johnson et al, 1998). Studies by Gebhardt and 
colleagues (1999) demonstrated that a dinucleotide (CA)n repeat polymorphism in the intron 1 of 
30 EGFR (near the downstream rahancer) ranging &om 14 to 21 repeals, spears to regulate EGFR 
expression. The longer allele with 21 repeats showed an 80% reduction of gene expression 
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compared to the shorter allele with 16 repeats (Gebhardt et al, 1999; Buerger et al, 2000). Data 
from studies on the polymorphic CA repeat suggest that this polymorphic site may play a role in 
cancer susceptibility (Brandt ij/., 2004). 

Overexpressioii of EGFR is found in about 30% of human primary tumors. Its activation 
5 in these tumors appears to promote tumor growth by increasing cell proliferation, motility, 
adhesion, invasive capacity, and by blocking apoptosis (Tysfies etal, 1997). EGFR 
overexpression and dysregulation has been associated with poorer prognosis in patients, and with 
metastasis, late:stage disease, and resistance to chemotherapy, hormonal therapy, and 
radiotherapy (Salomon et al, 1995); Akimoto et al, 1999); Wosikowski et al, 2000). 

10 Based on the observation that the overexpression of EGFR is associated wiHi some 

cancera and that it appears to promote tumor growth, the identification of polymorphisms in the 
EGFR gene relevairt to gene expression may be important for predicting an individual's risk of 
developing cancer and for predicting a cancer patient's prognosis. In addition, polymorphisms 
relevant to EGFR expression could also be used to evaluate toxicity, dosage, and potential 

15 . efBcacy of EGFR-targeting- agents. 

Several EGER-targeted cancer therapies are cunently under development. EGFR- 
targeting agents are typically directed to inhibiting EGFR jjhosphorylation or blocking EGF 
binding. Two EGFR-targeting drugs have been approved, Iressa (gefitinib) and Erbitux 
(cetuximab), and Tarceva (erlotinib) is in phase En trials. Because EGFR is the direct target of a 
20 number of anticancer drugs, variable expression of EGFR may directly affect drug response and 
toxicity. Therefore, polymorphisms in the EGFR gene relevant to gene expression or activity 
will be important both to further understanding the cell signal transduction and to elucidating 
drug response/toxicity. Studies of the polymorphisms in the EGFR gene may also be useM for 
jfijture drug design, 

25 EGFR ejqwession is also associated with diseases other than cancer. EGFR is a key 

element in renal tubular proliferation. Recently, an association was reported between an EGFR 
microsateUite polymorphism and the rate of progression of autosomal dominant polycystic 
kidney disease (ADPKD) (Magistroni et al., (2003). It was also demonstrated that inhibiting 
EGFR with a specific tyrosine kmase inhibitor (BKI-78S) could slow disease progression in a 

30 murine model of ADPKD (Sweeney et al. , 1999). 
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Human EGFR maps to chromosome 7pl2, a region that has been linked to inflammatory 
bowel disease (Satsangi et al, 1996). Furthermore, a marked increase in EGFR 
immunoreactivity has been observed in animal models of colitis (Reinshagen et al, 1993). It has 
been suggested that mutations that influence the function or expression of EGFR might 
predapose to inflammiatory bowel disease QVIartin et al. 2002). 

Given the importance of EGFR in regulating cell proliferation, polymorphisms in the 
EGFR gene relevant to its ejquession or activity will be important to further understand tiie 
progression of diseases associated with EGFR dysreguktion. The present invention has 
identified 12 polymorphisms in the 5' regulatory region of the EGFR gene, -1435 C>T, -1300 
G>A, -1249 G>A, -1227 G>A, -761 OA, -650 G>A, -544 G>A, -486 OA, -216 G>T, -191 
C>A, 169 G>T, and 2034 G>A. The polymorphisms are identified in relation to their position 
from the translation start site, which is designated +1. According to this nomenclature the 
nucleotide immediately 5' of +1 is -1, and the nucleotide immediately 3' of +1 is 2. The 
translation start site (+1) corresponds to nucleotide 9,385 of the EGFR gene locus (GenBank 
accession number AF288738) and nucleotide 505 of SEQ ID NO:l. SEQ ID N0:1 includes 
nucleotides 8,881 to 9,405 of the EGFR gene locus. 

One SNP, -1249 G>A is in the upstream enhancer while -216 G>T and -191 OA are in 
the promoter region. Interestingly, -216 G>T is located in a Spl binding site and the 
replacement of G by T may alter the Spl binding. The -191 OA is close to a transcription 
initiation site. Therefore, these SNPs may have a significant impact on the EGFR transcription. 

B. NUCLEIC ACIDS 

Certain embodhnents of the present invention concon various nucleic acids, including 
promoters, amplification primes, oligonucleotide probes and other nucleic acid elements 
involved in the analysis of genomic DNA. In certain aspects, a nucleic add comprises a wild- 
type, a mutant, or a polymorphic nucleic acid. 

The term "nucleic acid" is well known in the art. A "nucleic acid" as used herein will 
generally refer to a molecule (i.e., strand) of DNA, RNA or a derivative or analog thereof, 
comprising a nucleobase. A nucleobase includes, for exaniple, a naturally occurring purine or 
pyrimidine base found in DNA {e.g., an adenine "A," a guanine "G," a thymine "T" or a cytosine 
"C") or KNA ie.g., an A, a G, an uracil "U" or a C). The term "nucleic acid" enconq)asses the 
terais "oligonucleotide" and "polynucleotide," each as a subgenus of the term "nucleic acid." 
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The term "oligonucleotide"' refers to a molecule of between about 3 and about 100 nucleobases 
ia length. The temi "polynucleotide" refers to at least one molecule of greater than about 100 
nucleobases in length. A "gene" refers to coding sequence of a gene product, as well as introns 
and the promoter of the gene product. In addition to the EGFR gene, other regulatory regions 
5 such as the promoter and enhancers for EGFR are conten]4)lated as nucleic adds for use with . 
compositions and methods of the claimed invention. 

These definitions generally refer to a single-stranded molecule, but in specific 
embodiments will also encompass an additional strand that is partially, substantially or fiilly 
complementary to the single-stranded molecule. Thus, a nucleic acid may encompass a double- 
10 stranded molecule or a triple-stranded molecule that comprises one or more complementary 
strand(s) or "complement(s)" of a particular sequence comprising a molecule. As used herein, a 
single stranded nucleic acid may be denoted by the prefix "ss", a double stranded nucleic acid by 
the prefix "ds", and a triple stranded nucleic acid by the prefix "ts." 

The term "gene" refers to the segment of DNA involved in producing a polypeptide 
15 chain; it includes regions preceding and following the coding region as well as interveaung 
sequences (introns) between individual coding segments S(exons). A "promoter" is a region of a 
nucleic acid sequence at which initiation and rate of transcription are controlled. It may contain 
elements at which regulatory proteins and molecules may bind, such as RNA polymerase and 
other transcription factors, to initiate the specific transcription of a nucleic acid sequence. The 
20 term "enhaocer" refers to a cis-acting regulatory sequence involved in the transcriptional 
activation of a nucleic acid sequence. An enhancer can flmction in either orientation and may be 
upstream or downstream of the promoter. 

1. Preparation of Nucleic Adds 

A nucleic acid may be made by any technique known to one of ordinary skill in the art, 
25 such as for exanq)le, chemical synthesis, enzymatic production or biological production. Non- 
limiting examples of a synthetic luicleic acid (e.g. , a synthetic oligonucleotide), include a nucleic 
add made by in vitro chemical synthesis using phosphotriester, phosphite, or phosphoramidite 
chemistry and solid phase techniques such as described in European Patent 266,032, 
incorporated herein by reference, or via deoxynucleoside H-phosphonate intermediates as 
30 described by Froehler al, 1986 and U.S. Patent 5,705,629, each incorporated herem by 
reference. In the methods of the present invention, one or more oligonucleotides may be used. 
Various different mechanisms of oUgonucleotide synthesis have been disclosed in for example, 
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U.S. Patents 4,659,774, 4,816,571, 5,141,813, 5,264,566, 4,959,463, 5,428,148, 5,554,744, 
- 5,574,146, 5,602,244, each of which is incorporated herein by reference. 

A non-limiting example of an enzymaticaUy produced nucleic acid includes one 
produced by enzymes in amplificattion reactions such as PGR™ (see for example, U.S. Patent 
5 4,683,202 and U.S. Patent 4,682,195, each incorporated herein by refermce), or the synthesis of 
an oligonucleotide described in U.S. Patent 5,645,897, incorporated herein by reference. A non- 
limiting example of a biologically produced nucleic add includes a recombinant nucleic acid 
produced {i.e., replicated) in a living cell, such as a recombitiant DNA vector replicated in 
bacteria (see for example, Sambrook et al. 2001, incorporated herein by reference). 

10 2. Purification of Nucleic Acids 

A nucleic acid may be purified on polyacrylamide gels, cesium chloride centrifugation 
gradients, chromatography columns or by any other means known to one of ordinary skill in the 
art (see for example, Sambrook et al, 2001, incorporated hereui by reference). 

In certain aspects, the present invention concerns a nucleic acid that is an isolated nucleic 
15 acid. As used herein, the term "isolated nucleic acid" refers to a nucleic acid molecule {e.g., an 
RNA or DNA molecule) that has been isolated firee of, or is otherwise free of, the bulk of the 
total genomic and transcribed nucleic acids of one or more cells. In certain embodiments, 
"isolated nucleic acid" refers to a nucleic acid that has been isolated free of, or is otherwise free 
of, bulk of cellular components or in vitro reaction coiiq)onents such as for example, 
20 macromolecules such as lipids or proteins, small biological molecules, and the like. 

3. Nucleic Acid Segments 

In certain embodiments, the nucleic acid is a nucleic acid segment. As used herein, the 
teim "nucleic acid segment," are fragments of a nucleic acid, such as, for a non-limiting 
example, those that encode only part of a EGFR gene sequence.. Thus, a "nucleic acid segment" 
25 may comprise any part of a gene sequence, including from about 2 nucleotides to the Ml length 
gene including regulatory regions to the polyadenylation signal and any length that includes all 
the coding region. 

Various nucleic acid segments may be designed based on a particular nucleic, acid 
sequence, and may be of any length. By assigning numeric values to a sequence, for example, 
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the first residue is 1, the second residue is 2, etc., an algorithm definiixg all nucleic acid segments 
can be created: 

nton + y 

where n is an integer fix>iu 1 to the last number of the sequence and y is the length of the 
5 nucleic acid segment minus one, where n + y does not exceed the last number of the sequence. 

Thus, for a 10-mer, the nucleic acid segments correspond to bases ItolO, 2toll,3tol2... and 

so on. For a 15-mer, the nucleic add segments correspond to bases 1 to 15, 2 to 16, 3 to 17 ... 

and so on. For a 20-mer, the nucleic segments correspond to bases T to 20, 2 to 21, 3 to 22 ... 

and so on. In certain embodiments, the nucleic acid segment may be a probe or primer. As used 
10 herein, a "probe" generally refers to a nucleic acid used in a detection method or composition. 

As used herein, a "primer" generally refers to a nucleic acid used in an extension or amplification 

method or composition. 

4. Nucleic Acid Conqilements 

The present invention also encompasses a nucleic acid that is complementary to a nucleic 
15 acid. A nucleic acid "complement(s)" or is "complementary" to another nucleic acid when it is 
capable of base-paiiing with another nucleic acid according to the standard Watson-Cnck, 
Hoogsteen, or reverse Hoogsteen binding complementarity rules. As used herein "another 
nucleic acid" may refer to a separate molecule or a spatially separated sequence of the same 
molecule. In preferred embodiments, a complement is a hybridization probe or amplification 
20 primer for the detection of a nucleic acid polymorphism. 

As used herein, the term "complementary" or "complement" also refers to a nucleic acid 
comprising a sequence of consecutive nucleobases or semiconsecutive nucleobases (e.g. ; one or 
more nucleobase moieties are not present in the molecule) capable of hybridizing to another 
nucleic acid strand or duplex even if less than all the nucleobases do not base pair with a 
25 counterpart nucleobase. However, in some diagnostic or detection embodnnents, completely 
complementary nucleic acids are preferred. 

C. NUCLEIC ACID DETECTION 

Some embodiments of the invention concern identifying polymorphisms in EGER, 
correlating genotype or haplotype to phenotype, wherein the phenotype is lowered or altered 
30 EGFR -activity or expression, and then identifying such polymorphisms in patients who have or 

-14- 



wo 2005/085473 



PCTAJS2005/0D6559 



will be given EGFR-targeting drugs or compounds. Thus, the present invention involves assays 
for identifying polymorphisms and other nucleic acid detection methods. Nucleic acids, 
therefore, have utility as probes or primers for embodiments involving nucleic acid 
hybridization. They may be used in diagnostic or screening methods of the present invention. 
5 Detection of nucleic acids encoding EGFR, as well as nucleic adds involved in the expression or 
stability of EGFR polypeptides or transcripts, are encompassed by the invention. General 
methods of nucleic acid detection are provided below, followed by specific examples employed 
for the identification of polymorphisms, including single nucleotide polymorphisms (SNPs). 

1. Hybridization 

10 The use of aprobe or primer of between 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 

18, 19, 20, 21, 22, 23, 24, 25, 50, 60, 70, 80, 90, or 100 nucleotides, preferably between 17 and 
100 nucleotides in length, or in some aspects of the invention up to 1-2 kilobases or more in 
length, allows the fbrmation of a duplex molecule that is both stable and selective. Molecules 
having complementary sequences ovier contiguous stretches greater than 20 bases in length are 

15 generally preferred, to increase stability arid/oa: selectivity of the hybrid molecules obtained. One 
will generally prefer to design nucleic acid molecules for hybridization having one or more 
complementary sequences of 20 to 30 nucleotides, or even longer where desired. Such 
fragments may be readily prepared, for example, by directly synthesizing the fragment by 
chemical means or by. introducing selected sequences into recombinant vectors for recombinairt 

20 production. 

Jn certain embodiments, the probe or primer comprises 7, 8, 9, 10, U, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 50, 60, 70, 80, 90, or 100 consecutive nucleotides of SEQ ID 
NO: 1. In some embodiments, the probe or primer comprises 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 50, 60, 70, 80, 90, or 100 consecutive nucleotides of SEQ ID 
25 NO: 2. 

Accordingly, the nucleotide sequences of the invention may be used for their ability to 
selectively form duplex molecules with conaplementary stretches of DNAs and/or ENAs or to 
provide primers for amplification of DNA or KNA from samples. Depending on the application 
envisioned, one would desire to employ varying conditions of hybridization to achieve varying 
30 degrees of selectivity of the probe or primers for the target sequence. 
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For applications requiring high selectivity, one will typically desire to employ relatively 
high shingency conditions to form the hybrids. For example, relatively low salt and/or high 
temperature conditions, such as provided by about 0.02 M to about 0. 1 0 M NaCl at temperatures 
of about 50°C to about 70°C. Such high strmgency conditions tolerate little, if any, mismatch 
5 between the probe or primers and the template or target strand and would be particxdarly suitable 
for isolating specific genes or for detecting a specific polymorphism. It is generally appreciated 
that conditions can be rendered more stringent by the addition of increasing amounts of 
formatnide. For example, under highly stringent conditions, hybridization to filter-bound DNA 
may be carried out in 0.5 M NaHP04, 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65°C, 
10 and washmg m 0.1 x SSC/0.1% SDS at 68°C (Ausubel et al, 1996). 

Conditions may be rendered less stringent by increasing salt concentration and/or 
decreasing temperature. For example, a medium stringency condition could be provided by 
about 0.1 to 0.25M NaCl at temperatures of about 37°C to about 55°C, while a low stringency 
condition could be provided by about 0.1 5M to about 0.9M salt, at temperatures ranging from 
15 about 20°C to about 55°C. Under low stringent conditions, such as moderately stringent 
conditions the washing may be carried out for example in 0.2 x SSC/0. 1% SDS at 42°C (Ausubel 
et al; 1996), Hybridization conditions can be readily manipulated depending on the desired 
results. 

In other embodiments, hybridization may be achieved under conditions of, for example, 
20 50mM Tris-HCl (pH 8.3), 75mM KCl, 3mM MgCl2, l.OmM dithiothreitol, at temperatures 
between approximately 20°C to about 37°C. Other hybridization conditions utilized could 
include approxhnately lOmM Tiis-HCl (pH 8.3), 50mM KCl, LSnaM MgCU, at temperatures 
ranging fram approximately 40°C to about 72*'C . 

In certain embodiments, it will be advantageous to employ nucleic acids of defined 
25 sequences of the present invention in combination with an appropriate means, such as a label, for 
determining hybridization. A wide variety of appropriate mdioator means are known in the art, 
hicluding fluorescent, radioactive, enzymatic or other hgands, such as avidm/biotin, which are 
capable of being detected. In preferred embodiments, one may desire to employ a fluorescent 
label or an enzyme tag such as urease, ahcaline phosphatase, or peroxidase, instead of radioactive 
30 or other environmentally undesirable reagents. In the case of enzyme tags, coloriraetric hadicator 
substrates are known that can be employed to provide a detection means that is visibly or 
spectrofphotometrically detectable, to identify specific hybridization with complementary nucleic 
acid containing samples* In other aspects, a particular nuclease cleavage site may be present and 
-16- . 
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detection of a particular nucleotide sequence can be detemiined by the presence or absence of 
nucleic acid cleavage. 

In general, it is envisioned that the probes or primears described herein will be useful as 
reagents in solution hybridization, as in PGR™, for detection of expression or genotype of 
5 coireaponding genes, as well as in embodiments employing a solid phase. In embodiments 
involving a solid phase, the test DNA (or RNA) is adsorbed or otherwise affixed to a selected 
matrix or surface. This fixed, single-stranded nucleic acid is then subjected to hybridization with 
selected probes under desired conditions. The conditions selected will depend on the particular 
circumstances (depending, for example, on the G+C content, type of target nucleic acid, source 

10 of nucleic acid, size of hybridization probe, etc.). Optimization of hybridization conditions for 
the particular application of interest is well known to those of skill in the art. After washing of 
the hybridized molecules to remove non-specifically bound probe molecules, hybridization is 
detected, and/or quantified, by determining the amount of bound label. Representative soUd 
phase hybridization methods are disclosed in U.S. Patents 5,843,663, 5,900,481 and 5,919,626. 

15 Other methods of hybridization that may be used in the practice of the present invention are 
disclosed in U.S. Patents 5,849,481, 5,849,486 and 5,851,772. The relevant portions of these 
and other references identified in this section of the Specification are incorporated herein by [ 
reference. ■ 

2. Amplification of Nucleic Acids 

20 Nucleic acids used as a template for amplification may be isolated &om cells, tissues or 

other samples according to standard methodologies (Sambrook et al, 2001). In certain 
embodiments, analysis is performed on whole cell or tissue homogenates or biological fluid 
samples with or without substantial purification of the template nucleic acid. The nucleic acid 
may be genomic DNA or fractionated or whole cell RNA. Where KNA is used, it may be 

25 desired to first convert the RNA to a coraplanentary DNA. 

The term "primer," as used herein, is meant to encon:^)ass any nucleic acid that is capable 
of priming the synthesis of a nascent nucleic acid in a taoaplate-dependent process. Typically, 
primers are oligonucleotides fiom ten to twenty and/or thirty base pairs in length, but longra: 
sequences can be employed. Primers may be provided in double-stranded and/or single-stranded 
30 form, although the single-stranded form is preferred. 
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Pairs of primers designed to selectively hybridize to nucleic acids corresponding to the 
EGFR gene locus (Genbank accession number AF288738) or variants thereof, and fragments 
thereof are contacted with the template nucleic acid under conditions that permit selective 
hybridization. SEQ ID N0:1 includes nucleotides 8,881 to 9,405 of the EGFR gene locus with 
nucleotide 505 of SEQ ID N0:1 corresponding to the translational start site of the EGFR gene, 
thus the translational start site is located at nucleotide 9,385 of AF288738. Depending upon the 
desired application, high stringency hybridization conditions may be selected that will only aUow 
hybridization to sequences that are completely complementary to the primers. In other 
embodiments, hybridization may occur under reduced stringency to allow for amplification of 
nucleic acids that contain one or more mismatches with the primer sequences. Once hybridized, 
tbie template-primer complex is contacted with one or more enzymes that facilitate template- 
dependent nucleic acid synthesis. Multiple rounds of aoiplification, also referred to as "cycles," 
are conducted until a sufficient amount of anq}lification product is produced; 

The amplification product may be detected, analyzed or quantified. In. certain 
applications, the detection may be performed by visual means. In certain, applications, the 
detection may involve indirect identification of the product via chemiluminescence, radioactive 
scintigraphy of incorporated radiolabel or fluorescent label or even via a system using electrical 
and/or thermal impulse signals (Affymax technology; Bellus, 1994). 

A number of template dependent processes are available to amphfy the oligonucleotide 
sequences present in a given template sample. One of the best known amplification methods is 
the polymerase chain reaction (referred to as PGR™) which is described in detail in U.S. Patents 
4,683,195, 4,683,202 and 4,800,159, and in hinis et al, 1988, each of which is incorporated 
herein by reference in their entirety. 

Another method for amplification is ligase chain reaction ("LCR"), disclosed in European 
Application No. 320,308, incorporated herein by reference in its entirety. U.S. Patent 4,883,750 
describes a method similar to LCR for binding probe pairs to a target sequence. A raettiod based 
on PGR™ and oligonucleotide ligase assay (OLA) (described in fijrther detail below), disclosed 
in U.S. Patent 5,912,148, may also be used 

Alternative methods for amplification of target nucleic acid sequences that may be used 
in the practice of tiie present invention are disclosed in U.S. Patents 5,843,650, 5,846,709, 
5,846,783, 5,849,546, 5,849,497, 5,849,547, 5,858,652, 5,866,366, 5,916,776, 5,922,574, 
5,928,905, 5,928,906, 5,932,451, 5,935,825, 5,939,291 aiid 5,942,391, Great Britain Application 
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2 202 328, and in PCT Application PCT/US 89/01 025, each of which is incorporated herein by 
reference in its entirety. Qbeta RepHcase, described m PCT ^^pUcation PCT/US 87/00S8O, may 
also be used as an amplification method in the present invention. 

An isolhacmal amplification method, in which restriction endonucleases and ligases are 
5 used to achieve the amplification of target molecules that contain nucleotide 5'-[alpha-thio]- 
triphosphates in one strand of a restriction site may also be useful in the amplification of nucleic 
acids in the present mvention (Walker et al, 1992). Strand Displacement Amplification (SDA), 
disclosed in U.S. Patent 5,916,779, is anothearmelhod of carrying out isothermal amplification of 
nucleic acids which involves multiple rounds of strand displacement and synthesis, i.e., nick 
10 translation 

Other nucleic acid amplification procedures include transcription-based amplification 
systems (TAS), including nucleic acid sequence b^ed amplification (NASBA) and 3SR (Kwoh 
et al, 1989; PCT Application WO 88/10315, mcorporated herein by reference in their entirety). 
European Application 329 822 disclose a nucleic acid an^lification process involving cyclically 
15 synthesizing single-stranded RNA ("ssRNA"), ssDNA, and double-stranded DNA (dsDNA), 
which may be used in accordance with the present invention. 

PCT AppUcation WO 89/06700 (incorporated herein by reference" in its entirety) 
discloses a nucleic acid sequence amplification scheme based on the hybridization of a promoter 
region/prinxer sequence to a target smgle-stranded DNA ("ssDNA") followed by transcription of 
20 many RNA copies of the sequence. This scheme is not cyclic, i.e., new templates are not 
produced fi'om the resultant RNA transcripts. Other ampKfication methods include "RACE" and 
"one-sided PGR" (Frohman, 1994; Ohara ej; a/., 1989). 

3. Detectionof Nucleic Acids 

Following any amplification, it may be desirable to separate the ampUfication product 
25 firom the template and/or the excess primer, hi one embodiment, amplification products are 
separated by agarose, agarose-acrylamide or polyacrylamide gel electrophoresis using standard 
methods (Sambrook et al, 2001). Separated amplification products may be cut out and eluted 
&om the gel for further manipulation. Using low melting point agarose gels, the separated band 
may be removed by heating the gel, followed by extraction of the nucleic acid. 

3 0 Separation of nucleic acids may also b e effected by spin columns and/or chromatographic 

techniques known in art. There are many kinds of chromatogr^hy which may be used in the 
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practice of the present invention, including adsorption, partition, ion-exchange, hydroxylapatite, 
molecular sieve, reverse-phase, column, paper, thin-layer, and gas chromatography as well as 
HDPLC. 

In certain embodiments, the amplification products are visualized, with or without 
5 separation, A typical visualization method involves staining of a gel with efhidium bromide and 
visualization of bands under UV light. Altematively, if the amplificatipn products are integrally 
labeled with radio- or fluorometrically-iabeled nucleotides, the separated an^lification products 
can be exposed to x-ray film or visualized under the appropriate excitatory spectra. 

In one embodiment, following separation of ampUficalion products, a labeled nucleic 
10 acid probe is brought mto contact with the amplified marker sequence. The probe preferably is 
conjugated to a chromophore but may be radiolabeled. In another embodiment, the probe is 
conjugated to a bindmg partner, such as an antibody or biotin, or another binding partner 
caxiying a detectable moiety. 

In particular embodiments, detection is by Southern blotting and hybridization with a 
15 labeled probe. The techniques involved in Southern blotting are well known to those of skill -in 
the art (see Sambrobk et al, 2001). One example of the foregoing is described in U.S. Patent 
5,279,721, incorporated by referenoe heredn, wliich discloses an apparatus Jand method for the 
automated, electrophoresis and transfer of nucleic adds. The apparatus permits electrophoresis 
aad blotting without external manipulation of the gel and is ideally suited to carrying out 
20 methods according to the present invention. 

Other methods of nucleic acid detection that may be used in the practice of the instant 
invention are disclosed in U.S. Patents 5,840,873, 5,843,640, 5,843,651, 5,846,708, 5,846,717, 
5,846,726, 5,846,729, 5,849,487, 5,853,990, 5,853,992, 5,853,993, 5,856,092, 5,861,244, 
5,863,732, 5,863,753, 5,866,331, 5,905,024, 5,910,407, 5,912,124, 5,912,145, 5,919,630, 
25 5,925,517, 5,928,862, 5,928,869, 5,929,227, 5,932,413 and 5,935,791, each of which is 
incorporated herein by reference. 

4. Other Assays 

Other me&ods for genetic screening may be used within the scope of the present 
invention, for example, to detect mutations in genomic DNA, cDNA and/or RNA samples. 
30 Methods used to detect point mutations include denaturing gradient gel electrophoresis 
('T)GGE"), restriction firagment lengtib. polymorphism analysis ("KFLP"), chemical or enzymatic 
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cleavage methods, direct sequencing of target regions amplified by PGR™ (see above), single- 
strand conformation polymorphism analysis ("SSCP") and other methods well known in the art 

One method of screening for point mutations is based on RNase cleavage of base pair 
mismatches in RNA/DNA or RNA/RNA heteraduplexes. As used herein, the term "mismatcb" 
is defined as a region of one or more unpaired or mispaired nucleotides in a double-stranded 
KNA/EUSTA RNA©NA or DNA/DNA molecule. This definition thus includes mismatches due 
to insertion/deletion mutations, as well as single or multiple base point mutations. 

U.S. Patent 4,946,773 describes an RNaseA mismatch cleavage assay that involves 
annealing single-stranded DNA or RNA test samples to an RNA probe, and sidssequent 
treatment of the nucleic acid duplexes with KNaseA For the detection of mismatches, the 
single-stranded products of the RNaseA treatment, electrophoretically separated according to 
size, are compared to similarly treated control duplexes. Samples containing smaller fir^gments 
(cleavage products) not seen in the control diq>lex are scored as positive. 

Oliier investigators have described the use of RNasel in iniamatch assays. The use of 
KNasel for mismatch detection is described in literature from Promega Biotech. Promega 
markets a kit containing RNasel that is r^orted to cleave three out of four known mismatches. 
Others have described using the MutS protein ;or other DNA-repair enzymes for detection of 
single-base mismatches. 

Altemative methods for detection of deletion, insertion or substitution mutations that may 
be used in ttie practice of the present invention are disclosed in U.S. Patents 5,849,483, 
5,851,770, 5,866,337, 5,925,525 and 5,928,870, each of which is incorporated herehi by 
reference in its entirety. 

5, Specific Examples of SHP Screening Methods 

Spontaneous mutations that arise during the course of evolution in the genomes of 
organisms are often not immediately transmitted throu^iout all of ttie members of the species, 
thereby creating polymorphic alleles that co-exist in the species populations. Often 
polymorphisms are the cause of genetic diseases. Several classes of polymorphisms have been 
identified. For example, variable nucleotide type polymorphisms (VNTRs), arise j&om 
spontaneous tandem duplications of di- or trinucleotide repeated motifs of nucleotides. If such 
variations alter the lengths of DNA fi:agment8 generated by restriction endonuclease cleavage, 
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the variations are referred to as restriction fragment length polymoipliisnis (RFLPs). RFIPs are 
widely used in human and animal genetic analyses. 

Another class of polymorphisms are generated by the replacement of a single nucleotide. 
Such single nucleotide polymoiphisms (SNPs) rarely result in changes in a restriction 
5 endonuclease site. . Thus, SNPs are rarely detectable by restriction fragment length analysis. 
SNPs are the most common genetic vaiiations and occur once every 100 to 300 bases and several 
SNP mutations have been found that affect a single nucleotide in a protein-encoding gene in a 
manner sufficient to actually cause a genetic disease. SNP diseases are exemplified by 
hemophilia, sickle-cell anemia, hereditary hemochromatosis, Me-onset a/zheimer disease etc. 

10 In context of the present invention, polymorphic mutations that affect the activity and/or 

levels of the EGFR gene pn)duds will be determined by a series of screening methods. One set 
of screening methods is aimed at identifying SNPs fliait affect the inducibility, activity and/or 
level of the EGFR gene products in in vitro or in vivo assays. The other set of screening methods 
will then be performed to screen an individual for the occurrence of the SNPs identified above. 

15 To "do this, a sample (such as blood or other bodily fluid or tissue sample) will be takem fipom a 
patient for genotype analysis. The presence or absence of SNPs will determine the level of 
EGFR expression and/or activity. According to methods provided by the invention, these results 
will be used to adjust and/or alter the dose of the EGFR-targeting therapeutic agent given to an 
individual m order to reduce drug side effects. 

20 SNPs can be the result of deletions, point mutations and insertions. In general any single 

base alteration, whatever the cause, can result in a SNP. The greatea: frequency of SNPs means 
that they can be more readily identified than the other classes of polymorphisms. The greater 
uniformity of their distribution permits the identification of SNPs "nearer" to a particular trait of 
interest. The combmed effect of these two attributes makes SNPs extremely valuable. For 

25 example, if a particular trait (e.g., overexpression of EGFR) reflects a mutation at a particular 
locus, then any polymorphism that is linked to the particular locus can be used to predict the 
probability that an individual will be exhibit that trait. In some cases, the SNP may be the cause 
of the trait For example, a SNP in the Sp 1 bindmg site of the EGFR regulatory region may alter 
Spl binding and thus effect transcription of EGFR. 

30 Several methods have been developed to screen polymorphisms and some examples are 

listed below. The reference of Kwok and Chen (2003) and Kwok (2001) provide overviews of 
some of these methods; both of these references are specifically incorporated by reference. 

-22- . ; , 



wo 2005/085473 



PCT/US2005/006S5? 



SNPs relating to the regulation of EGFR gene expression can be characterized by the use 
of any of these methods or suitable modification thereof. Such methods include the direct or 
indirect sequencing of the site, the use of restriction enzymes where the respective alleles of the 
site create or destroy a restriction site, or the use of allele-specific hybridization probes, 

5 Examples of identifying polymorphisms and applying that informatioaa in a way that 

yields useful information regarding patients can be found, for example, in U.S. Patent No. 
6,472,157; U.S. Patent AppUcation PubHcations 20020016293, 20030099960, 20040203034; 
WO 0180896, all of which ate hereby incorporated by reference. 

a) DNA Sequencing 

10 The most commonly used method of characterizing a polymorphism! is direct DNA 

sequencing of the genetic locus that flanks and includes Ihe polymorphism. Such analysis can be 
.'accomplished using either the "dideoxy-mediated chain termination method," also known as the 
"Sanger Method" (Sanger et ah, 1975) or the "chemical degradation method," also known as the 
"Maxam-Gilbert method" (Maxam et al, \911). Sequencing in combitiation with genomic 

15 sequence-specific amplification technologies, such as the polymerase chain reaction may be 
utilized to facilitate the recovery of the desired genes (Mullis et al., 1986; European Patent 
Application 50,424; European Patent Application. 84,796, European Patent Application 258,017, 
European Patent Application. 237,362; European Patent Applicatioa 201,184; U.S. Pataits 
4,683,202; 4,582,788; and 4,683,194), all of the above incorporated herein by reference. . 

20 b) Exonuclease Resistance 

Other methods that can be employed to determine the identity of a nucleotide present at a 
polymorphic site utilize a speciaUzed exonuclease-resistant nucleotide derivative (U.S. Patent. 
4,656,127). A primer' complementary to an allelic sequence immediately 3'-to the polymorphic 
site is hybridized to the DNA under investigation. If the polymorphic site, on the DNA contains 
25 a nucleotide that is complementary to the particular exonucleotide-resistant nucleotide derivative 
present, then that derivative will be incorporated by a polymerase onto the end of the hybridized 
piimer. Such incorporation makes the primer resistant to exonuclease cleavage and thereby 
permits its detection. As the identity of the exonucleotide-resistant derivative is known one can 
determine the specific nucleotide present in the polymorphic site of the DNA. 
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c) Microsequencing Methods 

Several otheir primer-guided nucleotide incorporation piocedmes for assaying 
polymoiphic sites in DNA have been described (Komher et al., 1989; Sokolov 1990; Syvaaen 
1990; Knppuswamy et al., 1991; Frezant et al., 1992; Ugozzoll et al., 1992; Nyren et al, 1993). 
5 These methods rely on the incorporation of labeled deoxynucleotides to discriminate between 
bases at a polymorphic site. As the signal is proportional to the number of deoxynucleotides 
incorporated, polymorphisms that occur in runs of the same nucleotide result in a signal that is 
proportional to the length of the run (Syvanen etal.,1 990). 

d) . Extension in Solution 

10 French Patent 2,650,840 and PCT Application WO91/02087 discuss a solution-based 

method for determining the identity of the nucleotide of a polymorphic site. According to these 
methods, a primer complementary to allelic sequences immediately 3'-to a polymorphic site is 
used. The identity of the nucleotide of that site is determined using labeled dideoxynucleotide 
derivatives which are mcorporated at the end of the primer if complementary to the nucleotide of 

15 the polymorphic site. ' . ' 

■. i 

e) Genetic Bit Analysis or Solid-Phase Extension 

PCT Application W092/15712 describes a method that uses mixtures of labeled 
terminators and a primer that is complementary to the sequence 3' to a polymorphic site. The 
labeled terminator that is mcorporated is complementary to the nucleotide present m the 
20 polymorphic site of the target molecule being evaluated and is thus identified. Here the primer 
or the target molecule is hnmobilized to a solid phase. 

f) Oligonucleotide Ligation Assay (OLA) 

This is another solid phase method that uses different methodology (Landegren et al, 
1988). Two oligonucleotides, capable of hybridizing to abutting sequences of a single strand of 
.25 a target DNA are used. One of these oligonucleotides is biotinylated while the other is 
detectably labeled. If ttie precise complementary sequence is found m a target molecule, the 
oligonucleotides will hybridize such that their termini abut, and create a ligation substrate. 
Ligation permits the recovery of the labeled oHgonucleotide by using avidm. Other nucleic acid 
detection assays, based on this method, combined with PGR have also been described (Nickerson 
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et at, 1990). Here PGR is used to achieve tiie exponentiai ampUfication of target DNA, which is 
then detected using the OLA. 

g) Ligase/Polymerase-Mediated Genetic Bit Analysis 

U.S. Patent 5,952,174 describes a method that also involves two primers capable of 
5 hybridizing to abutting sequences of a target molecule. The hybridized product is formed on a 
solid siq)port to which the target is immobilized. Here the hybridization occurs such that the 
primers are separated ftom one another by a space of a single nucleotide. Incubating this 
hybridized product in the presence of a polymerase, a Ugase, and a nucleoside triphosphate 
mixture containing at least one deoxynucleoside triphosphate allows the' ligation of any pair of 
10 abutting hybridized oligonucleotides. Addition of a ligase results in two events required to 
generate a signal, extension and ligation. This provides a higha: specificity and lower "noise" 
than methods using either extension or hgation alone and unlike the polymerase-based assays, 
tins method enhances the specificity of the polymerase step by combining it with a second 
hybridization and a hgation step for a signal to be attached to the solid phase. 

15 h) Invasive Cleavage Reactions ' 

Invasive cleavage reactions can be used to evaluate cellular DNA for a particular 
polymorphism. A technology called INVADER® employs such reactions {e.g., de Amida et al, 
2002; Stevens et al, 2003, which are incorporated by reference). Generally, there are three 
nucleic acid molecules: 1) an oligonucleotide upstream of the target site ("upstream oligo"), 2) a 

20 probe oligonucleotide covering the target site ("probe"), and 3) a single-stranded DNA with the 
the target site ("target"). The upstream oligo and probe do not overlap but they contain 
contiguous sequences. The probe contains a donor fluorophore, such as fluorescein, and an 
acceptor dye, such as Dabcyl. The nucleotide at tiie 3' terminal end of the upstream oligo 
overlaps ("invades") tiie first base pair of a probe-target duplecs. Then the probe is cleaved by a 

25 stiiacture-specific 5' nuclease causing separation of the fluorophDre/quencher pair, which 
increases the amount of fluorescence that can be detected. See Lu et al, 2004, 

In some cases, the assay is conducted on a solid-surface or in an array format. 

h) Other Methods To Detect SNPs 

Several other specific methods for SNP detection and identification are presented below 
30 and may be used as such or with suitable modifications m conjunction with identifying 
polymorphisms of the EGFR. gene in the present invention. Several other methods are also 
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described on the SNP web site of the NCBI at the website www.ncbi.nkri.nih.gov/SNP, 
incorporated herein by reference, 

la a particular embodiment, extended haplotypes may be determined at any given locus 
in a population, which allows one to identify exactly which SNPs will be fedundant aad which 
will be essential in association studies. The latter is referred to as 'haplotype tag SNPs (htSNPs)', 
markers that capture the haplotypes of a gene or a region of linkage disequilibrium. See Johnson 
et al (2001) and Ke and Cardon (2003), each of which is incoiporated herein by reference, for 
exemplary methods. 

The VDA-assay utilizes PGR amphfication of genomic segments by long PGR methods 
using TaKaRa LA Taq reagents and other standard reaction conditions. The long amplification 
can amplify DNA sizes of about 2,000-12,000 bp. Hybridization of products to variant detector 
anay (VDA) can be performed by an Affymetrix High Tbrou^put Screening Center and 
analyzed with computerized software. 

A method called Chip Assay uses PGR amplification of genomic segments by standard or 
long PGR protocols. Hybridization products are analyzed by VDA, Halushka et al, 1999, 
incorporated herein by reference. SNPs are generally classified as "Certain" or "Likely" based 
on computer analysis of hybridization pattems. By comparison to alternative detection methods 
such as nucleotide sequencing, "Certain" SNPs have been confirmed 100% of the time; and 
"Likely" SNPs have been confirmed 73% of the time by this method.- 

Other methods simply, involve PGR amplification following digestion with the relevant 
restriction enzyme. Yet others involve sequencmg of purified PGR products fi:om known 
genomic regions. 

In yet another method, individual exons or overlapping firagments of large exons are 
PCR-amplified. Primers are designed flrom published or database sequences and PCR- 
amphfication of genomic DNA is performed using the following conditions: 200 ng DNA 
template, 0.5 \M each primer, 80 \M each of dGTP, dATP, dTTP and dGTP, 5% forraamide, 
l.SmM MgCl2, 0.5U of Taq polymerase and 0.1 volume of the Taq buffer. Thermal cycling is 
performed and resulting PGR-products are analyzed by PCR-single strand conformation 
polymorphism (PCR-SSCP) analysis, under a variety of conditions, e.g., 5 or 10% 
polyacrylamide gel with 15% urea, with or without 5% glycerol. Electrophoresis is performed 



-26- 



wo 2005/085473 



PCT/US2005/006559 



overnight. PCR-products that show mobility shifts are reamplified and sequenced to identify 
nucleotide variation. 

In a method called CGAP-GAI (DEMIGLAC3E), sequence and alignmeait data (fsma. a 
PHRAP.ace file), quality scores for the sequence base calls (from PHRED quality files), distance 
5 infonmtion (fram PHYLIP dnadist and neighbour programs) and base-calling data (&om 
PHEED '-d' switch) are loaded into memoiy. Sequences are aligned and examined for each 
vertical chunk ('shoe') of the resulting assembly for disagreement Any such slice is considered a 
candidate SNP (DEMIGLACE). A number of filters are used by DEMIGLACE to eliminate 
slices that are not likely to represent true polymorphisms. These include filters that: (i) exclude 

10 sequences in any given slice firom SNP consideration where neighboring sequence quality scores 
drop 40% or more; (ii) exclude calls in which peak amplitude is below the fifteenth percentile of 
all base calls for that nucleotide type; (iii) disqualify regions of a sequence having a high 
number of disagreements with the consensus fiom participating in SNP calculations; (iv) 
remove firom consideration any base call with an altemative call in which the peak takes up 25% 

15 or more of the area of the called peak; (v) exclude variations that occur in only one read 
direction. PHRED quality scores were converted into probability-of-error values for each 
nucleotide in the slice. Standard Bayesian methods are used to calculate the posterior probability 
that there is evidence of nucleotide heterogeneity at a given location. 

In a method called CU-RDF (RESEQ), PGR araphficatiott is performed fi^^m DNA 
20 isolated fi;om blood using specific primers for each SNP, and after typical cleanup protocols to 
remove unused primers and free nucleotides, direct sequencing using the same or nested primers. 

In a method called DEBNICK (METHOD-B), a conoparative analysis of clustered EST 
sequences is performed and confirmed by fluorescent-based DNA sequencing, ha a related 
method, called DEBNICK (METHOD-C), comparative analysis of clustered EST sequences 
25 with phred quality > 20 at the site of the mismatch, average phred quality >= 20 over 5 bases 5'- 
FLANIC and 3' to the SNP, no mismatches in 5 bases 5' and 3' to the SNP^ at least two 
occurrences of each allele is performed and confirmed by examining traces. 

In a method identified as ERO (RESEQ), new primers sets were designed for 
electronically published STSs and used to amplify DNA &om 10 different mouse strains. The 
30 amplification product from each stiiain is then gel purified and sequenced using a standard 
dideoxy, cycle sequencing technique with 33P-labeled terminators. All the ddATP terminated 
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reactions are then loaded ia adjacent lanes of a sequencing gel followed by all of the.ddGTP 
reactions and so on. SNPs are identified by visually scanning tiie radiograpbs. 

In another method identified as ERO (RESEQ-HT), nesw primers sets were designed for 
electronically published murine DNA sequences and used to amplify DNA from 10 different 
mouse strains. The amplification product from each strain is prepared for sequeaicing by treating 
with Exonuciease I and Shrimp Alkaline Phosphatase. Sequencing is performed using ABI 
Prism Big Dye Terminator Ready Reaction Kit (Perldn-Elmer) and sequence samples are run on 
the 3700 DNA Analyzer (96 Capillary Sequencer). 

FGU-CBT (SCA2-SNP) identifies a method where the region containing the SNP is PGR 
amplified using the primers SCA2-EP3 and SCA2-RP3. Approximately 100 ng of genomic 
DNA is amplified in a 50 ml reaction volume containing a final concentration of SmM Tiis, 
25mM KCl, 0.75mM MgCla, 0.05% gelatin, 20pmol of each primer and 0.5U of Taq DNA 
polymerase. Samples are denatured, annealed and extended and the PCR product is purified 
fixjm a band cut out of the agarose gel usmg, for example, the QIAquick gel extraction kit 
(Qiagen) and is sequenced usitig dye terminator chemistry on an ABI Prism 377 automated DNA 
sequencer with the PCR primers. 

In a method identified as JBLACK (SEQ/KESTRICT), two independent PCR reactions 
are performed with genomic DNA. Producte from the first reaction are analyzed by sequencing, 
indicating a unique Fspl restriction site. The mutation is confirmed in the product of the second 
PCR reaction by digesting with Fsp I. 

In a method described as KW0K(1), SNPs are identified by comparing high quality 
genomic sequence data from four randomly chosen individuals by direct DNA sequencing of 
PCR products with dye-terminator chemistry (see Kwok ei aL, 1996). In a related method 
identified as KWOK (2) SNPs are identified by comparing high quality genomic sequence data 
■from overl^ping large-insert clones such as bacterial artificial chromosomes (BACs> or Pl- 
based artificial chromosomes (PACs). An STS containing this SNP is then developed and the 
existence of the SNP in various populations is confirmed by pooled DNA sequencing (see 
Taillon-MUler et aL, 1998). in another similar method called KW0K(3), SNPs are identified by 
comparing high quality genomic sequeaice data from overlapping large-insert clones BACs or 
PACs. The SNPs found by tiiis approach represent DNA sequence variations between the two 
donor chromosomes but the allele frequencies in the general population have not yet been 
detemiined. In method KW0K(5), SNPs are identified by conq)aring high quality genomic 
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sequence data from a homozygous DNA sample and one or more pooled DiSIA samples by direct 
DNA sequencing of PGR products with dye-terminator chemistry. The STSs used are developed 
from sequence data found ia publicly available databases. Specifically, these STSs are amplified 
by PGR against a complete hydatidifonn mole (CHM) that has been shown to be homozygous at 
all loci and apool of DNA samples from 80 CEPH parents (see Kwok et al, 1994), 

In another such method, KWOK (OverlapSiq)DetectionWithPolyBayes), SNPs are 
discovered by automated computer analysis of overlapping regions of large-insert human 
genomic clone sequences. For data acquisition, clone sequences are obtained directly from 
large-scale sequencing centers. This is necessary because base quality sequences are not 
present/available through GenBank. Raw data processing involves analysis of clone sequences 
and accompanying base quahty information for consistency. Finished ("base perfect', error rate 
lower than 1 in 10,000 bp) sequences with no associated base quality sequences are assigned a 
uniform base quality value of 40 (1 in 10,000 bp error rate). Draft sequences witiiout base 
quahty values are rejected. Processed sequences are entered. into a local database. A version of 
each sequence with known human repeats masked is also stored. Repeat masking is performed 
with the program "MASKERAID." Overlap detection: Putative overlaps are detected with the 
program "WUBLAST." Several filtering steps follow in order to eliminate false overlap 
detection results, le. similarities between a pah of clone sequences that arise due to sequence 
duplication as opposed to true overlap. Total' length of overlap, overall percent similarity, 
number of sequence differences between nucleotides with high base quality value "high-quality 
mismatches." Results are also compared to results of restriction fragment mapping of genomic 
clones at Washington University Genome Sequencing Center, finisher's reports on overly, and 
results of the sequence contig building effort at the NCBI. SNP detection: Overlapping pairs of 
clone sequence are analyzed for candidate SNP sites with the 'POLYBAYES' SNP detection 
software. Sequence differences between the pair of sequences are scored for the probability of 
representing true sequence variation as opposed to sequencing error. This process requires the 
presence of base quality values for both sequences. Hi^-scoring candidates are extracted. The 
search is restricted to substitution-type single base pan variations. Confidence score of 
candidate SNP is computed by the POLYBAYES software. 

In a method identified by KWOK (TaqMan assay), the TaqMan assay is used to 
determine genotypes for 90 random individuals. In a method identified by KYUGEN(Q1), DNA 
samples of indicated populations are pooled and analyzed by PLACE-SSCP. Peak heights of 
each allele in the pooled analysis are corrected by those in a heterozygote, and are subsequently 
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used for calculation of allele frequencies. Allele frequencies higher than 10% are reliably 
quantified by this method. Allele frequency = 0 (zero) means that the allele was found among 
- individuals, but the corresponding peak is not seen in the examination of pool. Allele frequency 
= 0-0.1 indicates that minor alleles are detected in the pool but the peaks are too low to reliably 
5 quantify. 

In yet another method identified as KYUGEN (Methodl), PGR products are post-labeled 
with fluorescent dyes and analyzed by an automated cjtpillary electrophoresis system under 
SSCP conditions (PLACE-SSCP). Four or more individual DNAs are analyzed with or without 
two pooled DNA (Japanese pool and CEPH parents pool) in a series of experiments. Alleles are 

10 identified by visual inspection. Individual DNAs with different genotypes are sequenced and 
SNPs identified. Allele frequencies are estimated from peak heights in the pooled samples after 
correction of signal bias usmg peak heights in heterozygotes. The PGR primers are tagged to 
have 5'-ATT or 5'-GTT at their ends for post-labeling of both strands. Samples of DNA (10 
ng/ul) are amplified in reaction mixtures containbig the buffer (lOmM Tris-HCl, pH 8.3 or 9.3, 

15 50mM KCl, 2.0mM MgCh), 0.25 pM of each primer, 200 pM of each dNTP, and 0.025 units/nl 
of Taq DNA polymerase premixed with anti-Taq antibody. The two strands of PGR products are 
differentially labeled with nucleotides modified with Rl 1 0 and R6G by an exchange, reaction of 
Klenow fragment of DNA polymerase I. The reaction is stopped by adding EDTA, and 
unincorporated nucleotides are dephosphorylated by adding calf intestinal alkaline phosphatase. 

20 For the SSGP; an aliquot of fluorescently labeled PGR products and TAMRA-labeled internal 
markers are added to deionized fiwrnamide, and denatured. Electrophoresis is performed in a 
capillary using an ABI Prism 310 Genetic Analyzer. Genescan soflwaies (P-E Biosystems) are 
used for data collection and data processing. DNA of individuals including those who showed 
different genotypes on SSGP are subjected for direct sequencing usmg big-dye terminator 

25 chemistry, on ABI Prism 310 sequencers. Multiple sequence trace files obtained from ABI 
Prism 310 are processed and aligned by Phred/Phrap and viewed using Consed viewer. SNPs 
are identifiied by PolyPhred software and visual inspection. 

In yet another method identified as KYUGEN (Method2), individiials with different 
genotypes are searched by denaturing HPLG (DHPLG) or PLAGE-SSGP (lhazuka et aL, 1997) 
30 and then sequences are determined to. identify SNPs. PGR is perjfonned with primers tagged 
witii 5'-ATT or 5'-GTT at their ends for post-labeling of both strands. DHPLG analysis is carried 
out using the WAVE DNA fragment analysis system (Transgenomic). PGR products are 
injected into DNASep column, and separated under the conditions determined using 
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WAVEMaker program (Transgenomic). The two strands of PGR products that are differentially 
labeled -with nucleotides modified with RllO and R6G by an exchange reaction of Klenow 
fragment of DNA polymerase I. The reaction is stopped by adding EDTA, and unincorporated 
nucleotides are dephosphorylated by adding calf intestinal alkaline phosphatase. SSCP followed 
by electrophoresis is performed in a capillary using an ABI Prism 310 Genetic Analyzer. 
Genescan softwares (P-E Biosystems). DNA of individuals including those who showed 
different genotypes on DHPLC or SSCP are subjected for direct sequencing using big-dye 
terminator chemistry, on ABI Prism 310 sequencer. Multiple sequence trace files obtained from 
ABI Prism 310 are processed and aligned by Phred/Phrap and viewed using Consed viewer. 
SNPs are identified by PolyPhred software and visual inspection. Trace chromatogram data of 
EST sequences in Unigene are processed with PHRED. To identify likely SNPs, single base 
mismatches are reported &om. multiple sequence alignments produced by the programs PHRAP, 
BRO and POA for each Unigene cluster. BRO corrected possible misreported EST orientations, 
while POA identified and analyzed non-linear alignment structures indicative of gene 
mixing/cbimesras that might produce spurious SNPs. Bayesian inference is used to weigh 
evidence for true polymorpliism versus sequencing error, misalignment . or ambiguity, 
misclusteiing or chimeric EST sequences, assessing data such as raw chromatogram height, 
sharpness, overlap and spacing sequencing error rates; context-sensitivity; cDNA library origin, 
etc. 

In method identified as MARSHFIELD (Method-B), overl^ping human DNA sequences 
which contained putative insertion/deletion polymorphisms are identified tlurough searches of 
public databases. PGR primers which flanked each polymorphic site are selected from the 
consensus sequences. Primers are used to amplify individual or pooled human genomic DNA. 
Resulting PGR products are resolved on a denaturing polyacrylamide gel and a Phospborlmager 
is used to estimate allele fi:equencies from DNA pools. 

6. Linkage Disequilibrium 

Polymorphisms in linkage disequilibrium with the polymorphism at -1435, -1300, -1249, 
-1227, -761, -650, -544, -486, -216, -191, 169, or 2034 of the EGFR gene locus may also be used 
with the methods of the present invention. "Linkage disequilibrium" ("LD" as used herein, 
though also referred to as "LED" in the art) refers to a situation where a particular combination 
of alleles (i.e., a variant form of a given gene) or polymorphisms at two loci appears more 
firequently than would be expected by chance. "Significant" as used in respect to linkage 
disequilibrium, as detomined by one of skill in the art, is contemplated to be a statistical p or a 
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value that may be 0.25 or 0.1 and may be 0.1, 0.05. 0.001, 0.00001 or less." The relationship 
between EGFR haplotypes and the expression level of the EGFR protem may be used to 
correlate the genotype {i.e., the genetic make up of an organism) to a phenotype (i.e., the 
physical traits displayed by an organism or cell). "Haplotype" is used according to its plain and 
5 ordinary meaning to one skilled in the art. It refers to a collective genotype of two or more 
alleles or polymorphisms along one of the homologous chromosomes. 

D. KTIS 

Any of the coi]:q)08itions described herein may be comprised in a kit In a non-limiting 
example, reagents for detennining the genotype of one or both EGFR genes are included in a kit. 

10 The kit may foriher include individual nucleic acids that can amplify and/or detect particular 
nucleic acid sequences the EGFR gene. In specific embodiments, it includes one or more 
primers and/or probes. Nucleic acid molecules may have a label, dye, or other signalling; 
molecule attached to it, such as a fluorophore. It may also mclude one or more buffers, such as a 
DNA isolation buffers, an amplification hv££er or a hybridization buffer. The kit may also 

15 contain compounds and reagents to prepare DNA templates and isolate DNA from a sample.^ 
The kit may also include various labeling reagents and compounds. 

The components of the kits may be packaged either in aqueous media or in lyophilized 
fonn. The container means of the kits will generally include at least one vial, test tube, flask, 
bottle, syringe or other container means, into which a component may be placed, and preferably, 

20 suitably aliquoted. Where there are more than one component in the kit (labeling reagent and 
label may be packaged together), the kit also will generally contain a second, third or other 
additional container into which the additional components may be separately placed. However, 
various combinations of components may be comprised in a vial. The kits of the present 
invention also will typically include a- means for containing the nucleic acids, and any other 

25 reagent containers in close confinement for commercial sale. Such containers may include 
injection or blow-molded plastic containers into which the desired vials are retained. 

When the components of the kit are provided in one and/or more liquid solutions, the 
liquid solution is an aqueous solution, with a sterile aqueous solution being particularly 
preferred. However, the components of the kit may be provided as dried powder(s). When 
30 reagents and/or components are provided as a dry powder, tiie powder can be reconstituted by 
the addition of a suitable solvent. It is envisioned fliat the solvent may also be provided in 
anolher container means. 
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A kit will also include instructions for employing the kit components as well the use of 
any other reagent not included in the kit Instructions may include variations that can be 
implemented. 

It is contemplated that such reagents are embodiments of kits of the invention. Such kits, 
5 . however, are not hmited to the particular itans identified above and may include any reagent 
used directly or indirectly in the detection of polymorphisms in the EGFR gene or the expression 
level oftheEGFR gene. 

E. EXAMPLES 

The following exanxples are included to demonstrate preferred embodiments of the 
10 invention. It should be appreciated by those of skill m the art that the techniques disclosed in the 
examples which follow represent techniques discovered by the inventor to function well in the 
practice of the invention, and thus can be considered to constitute preferred modes for its 
practice. However, those of skill m the art should, in Ught of the present disclosure, appreciate 
that many changes can be made in the specific embodinaents which are disclosed and still obtain 
15 a like or similar result without departing from the spirit and scope of the invention. 

EXAMPLE 1 

Discovery of Single Nucleotide Polymorphisms (SNPs) in EGFR R^alatory Region 

DNA samples from Coriell Cell Repository were used for resequencing. The samples 
include 22 Caucasians, 23 Afiican-Ameripans and 23 Asians. For SNP discovery, PGR was used 
20 to amplify the approximately 4.5 kb fragment containing the upstream and downstream 
enhancer, promotear, exoti 1 and part of intron 1 using the primers m Table 1, Purified PGR 
products were directly sequenced from both ends. ABI-3700 capillary sequencer and a 
phred/phrap/polyphred/consed pipeline (World Wide Web at phrap.org/) were used to identify 
the polymorphisms. 

25 Table 1 



SEQIDNO: 


3 


EGFRIL-F 


GTrCCACTGTTGTGCrrCCC 


4 


EGFRIL-R 


AAGAAAGTTGGGAGCGGTrC 


5 


EGFRIL-AF 


GGGTGGACnTGCCAAAGGA 


6 


EGFRIL-AR 


CTTAGAGCCAGCGTCGGATA • 
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SEQIDNO: 



7 


EGFRIL-IF 


GCATGACTTCAACGCACAGT 


8 


EGFRIL-IR 


GAGGCTAAGTGTCCCACTGC 


9 


EGFR1L-2F 


TCGGACTTTAGAGCACCACC 


10 


EGFR1L-2R 


GAGGAGGAGAATGCGAGGAG 


11 


EGF11L-3F 


AAATTAACTCCTCAGGGCACC 


12 


EGFUL-3R 


CGCCCTTACCrTTCTTrTCC 


13 


EGER.1L-4F 


CCCTGACTCCGTCCAGTATT 


14 


EGFR2L-F 


(XJTCCnTCCrGTTTCCTTG 


15 


EGFR2L-R 


ACCAGCTGTGGGAAAGTCAC 


16 


EGFR2L-1R . 


AGACGAGTTCrCCCAGCTCC 


17 


EGFR2L-2F 


GCGCAGGTCTCAAACTGAAG 


18 


EGFR2L-2R 


GGAGAAGTTTGCTGTGAGCC 


19 


EGFR2L-3F 


CCCTCGTCTTGCCTATCCA 


20 


EGFR2L-3R 


AGTGATCCCCAAATCTGGCT 


21 


EGFR2L4F 


GGCATAGAACAGTGGTTCCC 


22 


EGER2L-4R 


GAACACCAATGGAGGGAGAA 


23 


EGFR2L-5F 


, TGAAGGAACTGGTGGAAAGG 


24 


EGFR2L-5R 


CATGTCCCAGAACCAAACAA 



By resequencing 4 kb of the EGFR 5' regulatory region, including tlie promoter and 
enhancers, twelve single nucleotide polymorphisms were identified from 68 DNA samples 
consisting of 22 Caucasians, 23 African-Americans and 23 Asians (FIG. 1 and Table 2). Five 
5 SNPs showed relatively higher frequency (rare allele frequency > 1 0%) at least in one population 
compared to the other seven rare ones. Nine SNPs were observe in the promoter or enhancer 
regions and three of these were frequent. One SNP, -1249 G>A (10% hi African-Americans) is 
in the upstream enhancer while -216 G>T (29% in African-Americans and 34% in Caucasians) 
and -191 OA are in the promoter region (18% in Caucasian) (FIG. 1 and Table 2). 
10 Interestingly, -216 G>T is located in a Spl binding site (-216) and the replacement of G by T 
may alter the Spl binding. Meanwhile, the -191 C>A is close to a transcription initiation site 
(FIG. 2) (Ishii et aL, 1985; Haley et al, 1987; Johnson et oL, 1988; Kageyama et al, 1988). 
Therefore, these SNPs may have a significant impact on the EGFR transcription. 
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EXAMPLE 2 

Functional Characterization of Two Promoter SNPs (-216G>T and -191 OA). 

Potential function of two SMPs (-216 G>T and -191 OA) in the EGFR promoter region 
5 were characterized by in vitro transient transfection assay' and electrophoretic mobility shift 
assay (EMSA). 

Haplotypes. The SMP -216 G>T was found to be jSrequent in Aftican-Americans (29%) 
and Caucasians (34%) but relatively rare in Asians (9%), while -191 OA was only found in 
Caucasians (18%) when the inventors sequenced the 68 samples from different ethnic groups 
10 (Table 3). Linkage disequilibriimi and haplotype analysis showed that -216 G>T and -191 OA 
are not in strong LD (D'= 0.5562, p>0.05,) and three haplotypes were observed in the samples: 
G-C, G-A and T-C, see Table 3 below. DNA fragments containing these three haplotypes were 
amplified and cloned while the T-A h^lotype was constructed by ligating the T fragment and A 
fragment from the Dra HI digested G-A and T-C haplotypes (FIG. 3). 

15 Table 3. The haplotype frequency of ~216G>T and -191C>A m Caucasian, African-American 



and Asian populations. ^_ 



Haplotype 


Caucasian 


African-American 


Asian 


G-C 


0.48 


0.71 


0.92 


G-A 


0.18 


0.00 


0.00 


T-C 


0.34 


0.29 


0.08 


T-A 


0.00 


0.00 


0.00 



Vectors and detecting system. PGL3-luc+ basic reporter vector (Promega) carrying 
each of the four target DNA fragments and pRL-TK reporter vector (Promega) containing the 
20 renilla gene driven by the herpes simplex vmis thymidine kinase (HSV-TK) promoter were co- 
transfected mto MDA-MB-231 cells to oompare tiie relative expression of tiie luciferase gene. A 
Dual-Luciferase reporter assay system (Promega) was used to detect the expression level of 
luciferase. PGL3-luc+ basic vector and PGL3-luc+ SV40-promoter vector were used as negative 
and positive controls, respectively, 

25 Deletion mapping studies have shown that a 384 bp fragment upstream of exon 1 

contaunng these two SNPs has the essential promoter function (FIG. 2) (Johnson et al, 1988). 
This fragment was therefore amplified from the individuals with specific haplotypes by PGR 
usmg Proo&tart DNA polymerase (Qiagen), which is modified for high-fidehty DNA 
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amplification. Primers were designed to amplify the 515 bp amplicon indicated in FIG. 2. The 
primer sequence was forward primer: 5'-CCACCGGTACCGGCGGCCGCTGGCCTTG-3' 
(SEQ ID NO: 25) and reverse primer: 5'-CGGCGAGACACGCCCTTACCTTT-3' (SEQ ID NO: 
26). This 515 bp amplicon contains a Sad cutting site at 3' end (FIG, 2). To facilitate the 
subcloning, the forward primer was designed to contain a Kpnl site. The fragment was digested 
by Kpnl and Sacl and a 405 bp product was then cloned into the Rpnl/Sacl site of pGLS-lncH- 
basic vector. To confirm the inserted DNA fragments, all plasmids were sequenced to exclude 
PGR errors, check the orientation of the fragment, and assure the haplotypes before transfection. 

Transient transfection. The MDA-MB-231 cell line was maintained in RPMI1640 
media (invifrogen) with 10% FBS and 2mM L-Glutamine. Transient transfection was performed 
by Transfectamme2000 (Invitrogen) according to the manufacture's instructions. All 
transfections were performed in triphcate, and repeated three times. Cells were co-transfected 
with pRL-TK vector to normalize the transfection efficiency. Afl»r transfection, cells were 
cultured for 24 hours, washed, lysed, and analyzed using the Dual Luciferase kit (Promega) 
according to the manufacturer's instructions. 

The in vitro transcriptional efBciency of luciferase driven by the four haplotypes were 
compared. Significantly higher luciferase activity in the T-C haplotype vector was observed 
than in the G-C haplotype vector (FIG. 4, p<0.01). The T-C and G-C haplotypes are the most 
frequent haplotypes in Caucasian, African-American, and Asian populations (Table 3). In 
addition, the -216 G>T polymorphism contributed more to luciferase activity than the -191 C>A . 
polymorphism (FIG. 4; FIG. 6A jr7<0.04.for all comparisons). This effect was independent of the 
EGFR expression level of the cells (FIG. 6B). On average, the substitution of the G allele by the 
T allele demonstrated about a 30% increase in luciferase gene expression. 

To further confirm the potential cooperative effect of the DNA alteration and Spl on 
promoter activity, transient transfection w£^ also performed in the Drosophila melanogaster 
Schneider cell Ime 2(SL-2) in which Spl is deficient (Courey et al, 1988). As a result, co- 
transfection of pGL3£Gi<!Rluc with Spl expression vector resulted in about 100-fold induction 
of promoter activity conq)ared to transfection of pGL3£'GFKluc alone. Co-transfection of pPac- 
Spl and each of four pGL3£^GFHluc constructs demonstrated a significantly lower promoter 
activity driven by G-C haplotype compared to the T-C haplotype (/)<0.03. FIG. 6A). 

Electrophoretic Mobility Shift Assay (EMSA). EMSA was used to evaluate nuclear 
protein binding at the -216G>T polymorphic site. Nuclear proteins were extracted from MDA- 
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MB-23 1 cells using the NE-PER Nuclear and Cytoplasmic Extraction Reagents according to the 
manufacture's protocol (Pierce, Rockford, USA). The probes and competitors correspondmg to 
the G allele, T allele, and Spl binding consensus sequence are listed in Table 4. 

Table 4. Probes and competitora used for EMSA. The position of polymorphic nucleotide was 
5 bolded and underlined 

Name Sequence 

G allele probe 

GPF (SEQIDNO:27): 5'-biotm-GCAGCCTCCGCCCCCCGCACGGTGT-3' 

GPR (SEQIDNO:28): 5'-biotin-ACACX:GTGCGGGGGGCGGAGGCrGC-3' 
G allele Competitor 

GCF (SEQIDN0:29): 5'-GCAGCCTCCGCCCCCCGCACGGTGT-3' 

, GCR (SEQIDNO:30): 5'-ACACCGTGCGGGGGG£GGAGGCTGC-3' 
T allele probe 

TPF (SEQ IDN0:31): 5'-biotin-GCAGCCTCCTCCCCCCGCACGGTGT-3' 

TPR (SEQ1DN0:32): 5'-biotin-ACACCGTGCGGGGGGAGGAGGCTGC-3' 
T allele Competitor 

TCF (SEQ ID NO: 33): 5'-GCAGCCTCCTCCCCCCGCACGGTGT-3' 

TCR (SEQIDNO:34): 5'-ACACCGTGCGGGGGGAGGAGGCrGC-3' 
Spl control probe 

SplPF (SEQ ID NO; 35): 5'-biotin-ATTCGATCGGGGCGGGGCGAGC-3' 

SplPR (SEQ ID NO: 36): 5'-biotm-GCTCGCCCCGCCCCGATCGAAT-3' 
Spl competitor 

SplCF (SEQ ID NO: 37): 5'-ATTCGATCGGGGCGGGGCGAGC-3' 

SplCR (SEQ ID NO: 38): 5'-GCTCGCCCCGCCCCGATCGAAT-3' 

Probes were synth^ized as smgle strand and end labeled using biotb. Unlabeled 
oligonucleotides with the same sequences were used as competitors. Double-stranded DNA was 
made by the annealing of two complementary oligonucleotides. EMSA was performed using tiie 
10 LightSbift Chemilmninescent EMSA Kit (Pierce, Rockford, USA) according to the 
manufacture's instmctioBS. 

Briefly, binding reactions were performed by incubating the nuclear extracts with the 
binding buffer. (100 mM Tris-HCI, pH 7.5; 500 mM NaCl, 25 mM MgCl2, and 5 mM 
difhiothreitol), 1 \ig poly(dI-dC), and 0.2 pmol (200,000 cpm) labeled probe for 20 minutes at 
1 5 room temp^ature. For con^etition assays, 100-fold molar excess of unlabeled oligonucleotides 
(specific, nonspecific, or Spl specific) were included m the binding reaction. After binding, the 
samples were separated in a 5% nondenaturing polyacrylamide gel in Q,5x TBE for 2 hours at 4° 
C. Binding reactions were then transferred to a nylon membrane (Amersham Pharmacia 
Biotech) by electrophoresis in 0.5x TBE, at lOOV for 40 mmutes. After transfer, DNA was 
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cross-linked at UOmJ/cm^ under 254 mn UV light. Biotin-labeled DNA was detected and 
visualized using the chemiluimnescent based detection procedure in the ChemiDoc system (Bio- 
Rad). 

EMSA was perfoimed to test the binding efficiency of nuclear proteins to each allele 
specific probe. Hie Spl consensus probe was used as the control to show the binding and 
position of shifting. Significantly higher binding efficiency of nuclear protein fi:om MDA-MB- 
23 1 cells was observed with the T allele probe compared to the G allele probe (FIG. 5). 

Haplotypes of -216GyT-191C/A Were Associated mibEGFR mRNA Expression tn 
vivo. Human fibroblast cells (which express EGFR) were selected to evaluate the association 
between -216G/T-191C/A haplotypes and EGFR transcription. According to the previous 
reports, there were multiple transcription initiation sites in the EGFR promoter (Johnson et aL, 
1988; Kageyama et aL, 1988), while the major site for in vivo transcription was at position -260 
(ECageyama et al, 1988). Thus, the positions -216 and -191 would be present in most EGFR 
mRNA sequences. Ten cell lines witii diplotype G-C/T-C for the two polymorphisms were 
chosen so that there was the potential to detect a difference of expression level between mENA 
carrying T-C haplotype and G-C haplotype within the same cell. As a result, a significant 
deviation of the average relative ratio from the hypothetical ratio 1:1 was observed (Mean of R 
=1.3940.12, 95% CI l.ll-1.67,/«0.02), demonstrating that ^GF^ niRNA derived from tlie T-C 
haplotype was about 40% higher than that firom the G-C haplotype. This finding indicates that 
the -216G/T variant also has a strong intact on EGFR transcription in vivo. 

In addition to the allelic imbalance, the relative expression of EGFR among the above 
three human cell lines was evaluated by real-time PGR. Merestingiy, the EGFR level among 
these cells were in agreement with thek diplotypes with a dramatically high level of EGFR in 
MDA-MBA-23 1 cells, but about 6-fold less hi HEK293 and the lowest in MCF-7 (FIG. 6B). 

All of the compositions and methods disclosed and claimed herein can be made and 
executed without undue experimentation m light of the present disclosure. While the 
compositions and methods of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that variations may be applied to the 
compositions and methods and in the steps or in the sequence of steps of the methods described 
herein without departing from the concept, spfrit and scope of the invention. More specifically, 
it will be apparent that certain agents which are both chemically and physiologically related may 
be substituted for the agents described herein while the same or similar results would be 
achieved. All such similar substitutes and modifications apparent to those skilled in the art are 
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CLAIMS 

1 . A method for evaluating the potential efficacy of an EGFR-targeting therapeutic agent 
for the treatment of cancer in a patient comprising deterrnining the sequence of a 
polymorphism in one or both EGFR genes in the patient 

2. The method of claim 1, wherein the polymorphism is at, or in linkage disequilibrium 
with, a nucleotide position selected from the group consisting of nucleotide positions - 
1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, and 2034. 

3 . The method of cktm 2, wherein the polymorphism is, or is in linkage disequilibrium 
with, a polymorphism selected from the consisting of -1435 C>T, -1300 G>A, -1249 
G>A, -1227 G>A, -761 C>A, -650 G>A, -544 G>A, -486 OA, -216 G>T, -191 OA, 
169 G>T, and 2034 G>A. 

4. The method of claim 1, further comprising determining the sequence of at least two 
polymorphisms in one or both EGFR genes in the patient. 

5. The method of claim 1, wherein the EGFR-targeting therapeutic agent is an EGFR- 
tyrosine kinase inhibitor. 

6. The metiiod of claim 5, wherein the EGFR-tyrosine kinase inhibitor is gefitinib or 
erlotimb, 

7. The method of claim 1, wherein the EGFR-targeting therapeutic agent is a monoclonal 

antibody. 

8. The method of claim 7, wherein the monoclonal antibody is oetuximab. 

9. The method of claim 3, wherein the polymorphism is -21 6 G>T. 

1 0. The method of claim 9, wherein a T at position -21 6 on an allele is an indicator of 
higher expression of EGFR protein, and further wherein the M^oer expression of EGFR 
protem is an indicator of decreased efficacy of the EGFR-targeting theri^eutic agent. 

11. The method of claim 1 , fiirther comprising detennining the sequence of a polymorphism 
in both EGFR genes in the patient. 
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12. The method of claim 1, wherein deteimining the sequence of a polymorphism is 
performed by a hybridization assay. 

13. The method of claim 1, wherein detenrdning the sequence of a polymorphism is 
perfonned by an allele specific amphfication assay. 

1 4. The method of claim 1 , wherein determining the sequence of a polymorphism is 
performed by a sequencing or a microsequencing assay. 

15. The method of claim 1, wherem determirdng the sequence of a polymorphism is 
performed by digestion with a restriction enzyme. 

16. The method of claim 1, ftirther comprising obtanring a sample. 

1 7. The method of claim 1 6, wherein the sample comprises buccal cells, mononuclear cells, 
or cancer cells. 

18. The method of clahn 1, further comprising administering the EGFR-targetmg 
therapeutic agent to the patient. 

19. A method for predicting the clinical prognosis for a cancer patient comprising 
determimng the sequence of a polymorphism in one or both EGFR genes in the patient. 

20. The method of claim 19, fiirther comprising deteraaining the sequence of a 
polymorphism in both EGFR genes in the patient. 

21. The method of claim 19, wherem the polymorphism is at, or in linkage disequiUbrium 
with, a nucleotide position selected from the group consisting of nucleotide positions - 
1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, and 2034. 

22. The method of claim 21, wherein the polymorphism is, or is in linkage disequilibrium 
with, apolymorphism selected from the consistmg of -1435 OT, -1300 G>A, -1249 
G>A, -1227 G>A, -761 OA, -650 G>A, -544 G>A, -486 OA, -216 G>T, -191 OA, 
169 G>T, and 2034 G>A. 

23. The method of claim 22, wherein the polymorphism is -216 G>T. 

24. The method of claim 23, wherein a T at position -216 on an allele is an indicator of an 
increased expression of EGFR protein. 
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25 . The method of claim 24, wherein the increased expression of EGER protein is predictive 

of poor prognosis. 

26. The method of claim 25, wherein the poor prognosis indicates increased resistance to 
chemotherapy, hormonal Hhesiapy, or radiotherapy. 

27 . The method of claim 25, wherein the poor prognosis indicates increased risk of 
metastasis. 

28. A method for evaluating a patient' s risk of toxicity to an EGFR-targeting therapeutic 
agent comprising determining the sequence of a polymorphism in one or both EGFR 
genes in the patient. 

29. The method of claim 28, wherein the polymorphism is at, or in linkage disequilibrium 
with, a nucleotide position selected from the group consisting of nucleotide positions - 
1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, and 2034. 

30. The method of claim 29, wherein the polymorphism is, or is in linkage disequilibrium 
with, a polj-morpMsm selected fiom the consisting of -1435 C>T, -BOO G>A, -1249 
G>A, -1227 G>A, -761 OA, -650 G>A, -544 G>A, -486 OA, -216 G>T, -191 OA, 
169 G>T, and 2034 OA. 

31 . The method of claim 30, wherein the polymorphism is -216 G>T. 

32. The method of claim 3 1 , wherein a T at position -2 1 6 on one or both alleles is an 
indicator of decreased toxicity of the EGFR-targeting therapeutic agent. 

33. The method of claim 28, finther comprising determining the sequence of a 
polymorphism in both EGFR genes in the patient 

34. A mettiod for predicting the expression level of EGFR in a cell comprising determining 
the sequence at position -216 in one or both alleles of the EGFR gene in the cell, 
wherein a T at position -216 in one or both alleles is indicative of a higher expression 
level. 

35. A method for evaluating the potential efficacy of an EGFR-targetmg therapeutic agent 
for the treatment of a disease associated with the dysregulation of EGFR in a patient 
comprising determining the sequence of a polymorphism in one or both EGFR genes in 
thepatient 
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36. A kit for evaluating the potential efficacy of an EGFR-targeting tiierapeutic agent in a 
patient comprising a nucleic acid for determining the sequence of a polymorphism in an 
EGFR gene locus. 

37, The kit of claim 36, wherein the nucleic acid is a primer for amplifying a polymorphism 
at a nucleotide position selected from the group consistiiig of -1435, -1300, -1249, 
-1227, -761, -650, -544, -486, -216, -191, 169, and 2034. 

3 8 . The kit of claim 3 6, wherein the nucleic acid is a specific hybridization probe designed 
to detect a polymorphism at a nucleotide position selected from the group consisting of - 
1435, -1300, -1249, -1227, -761, -650, -544, -486, -216, -191, 169, and 2034. 

39. The kit of claim 38, wherein the specific hybridization probe is comprised in an 
oligonucleotide array or micioairay. 

40. A kit for evaluating the potential efficacy of an EGFR-targeting therapeutic agent in a 
patient comprising a restriction en2yme for determining the sequence of a polymorphism 
in an EGER gene locus. 
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<210> 1 

<21X> 525 

<212> DNA 

<213> .Homo sapiens 

<400> 1 

gaaattaact cctcagggca cccgctcccc tcccatgcgc cgccccactc ccgacggaga 60 

ctaggtcccg cgggggccac cgctgtccac cgcctccggc ggccgctggc cttgggtccc 120 

cgctgctggt tctcctccct cctcctcgca ttctcctcct cctctgctcc tcccgatccc 180 

tcctccgccg cctggtccct cctcctcccg qcctgcctcc ccgcgcctcg gcccgcgcga 240 

gcfcagacgtc cgggcagccc ccggcgcagc gcggccgcag cagcctccgc cccccgcacg 300 

gtgtgagcgc ccgacgcggc cgaggcggcc ggagtcccga gctagccccg gcggccgccg 350 

ccgcccagac cggacgacag gccaectogt cggcgtccgc ccgagtcccc gcctcgccgc 42 0 

caacgccaca acpaccgcgc acggccccet gactccgtcc agtattgatc gggagagccg 480 

gagcgagctc ttcggggagc agcgatgcga ccctccggga cggcc 525 



<210> 2 

<211> 4990 

<212> DMA 

<213> Homo sapiens 

<400> 2 

ctccacagag gctgtgagct agagccctaa ctgtgcaggg ccctaactat gccaggctac 60 
ttatctctct taagaggact tcattagtgc otgctcggcc atacagtttt ttacttacca 120 
agtaacacag ttatcagcac actccaggta ctagccaagg actacaaaat caacgtgaat 180 
gtcagctttt gtatcaaaag ctcaaaggag aaactcaaac tttacataga tgtcccatga 240 
agatgttcag caaacccatt cttctctgtt ccctggaatc catcccagta ttgtgctatg 3 00 
tgtgtgtcta gtaattcttt acaaaaagct ctgtttcttg tgatgctatc agatcacatt 360 
gaagaatata caagccgtac tatgaaggct gttgtctcat atagtcctaa cgtagtgaga 420 
actgatgttc ttacatgctg tctttttggg cactcaaaga aattcctgta cagtcttaca 480 
aatcagttgt agcttaaatt gatttgtgtt gtgacttgta cacacaggtc acattccctt 540 
gacagaaaat atagtttaaa aocaaatttg cagccettgt taagtgaatg cacaggactt 600 
tattgtattc aggtctttta ttgtaagact cactcctgtc ttcattttat gttccactgt 660 
tgtgcttccc atttgccttt ctctagtttt gttttctgtg tttctacgga ctgctctcag 720 
cccaggtgtg caggaagcac acacatgcct gcagagcctt catggcctct gcattcaggg 780 
catgactfcca acgcacagtg gctgtactga tttgttaaaa caaaggaaca gattacttct 840 
cctaattcac agggaagttc caggttgtgc gggcagtgag cagacctgtg tctgtctgcg 900 
cttgccctgg tgaaaaaccc caccgttcag gctgcagggt gcgagaccca ggoacaaaca 960 
ttttgctgga tgaggaggaa agatgtaagg ttgctcpcct tcagagacag caaagggcag 1020 
gtctgtagct tcacttactt caggattgtg atttttgaca gagccgagag atcagggttg 1080 
ttgaaccagg cctgaaggtc ctagtgaatc tcgtgaagag aggaggggto tggctgtaac 1140 
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atggacctag aggacatttt tactgcagga gaaggaacag tggggatggg gtggacttgc 12 00- 
caaaggaata tagctcaagt tcctgcagcc caaaaaagct cagtttcttt tggccaaagc 1260 
ttccgcgagt ttccctggca tttctcctgc gggagctaca ggggcagtgg gacacttagc 1320 
ctctctaaaa gcacctccac ggctgtttgt gtcaagcctt tattccaaga gcttcacttt 1380 
tgcgaagtaa tgtgcttcac acattggett caaagtaccc atggctggtt gcaaitaaaca 1440 
ttaaggaggc ctgtctctgc acccggagtt gggtgccctc atttcagatg atttcgaggg 15 00 
tgcttgacaa gatctgaagg accctcggac tttagagcac cacctcggac gcctggcacc 1560 
cctgccgcgc gggcacggcg acctcctcag ctgccaggcc agcctctgat ccccgagagg 1S20 
gtcccgtagt gctgcagggg aggtggggac ccgaataaag gagcagtttc cccgtcggtg 1680 
ccattatccg acgctggctc . taaggctcgg ccagtctgtc. taaagctggt acaagtttgc 1740 
tttgtaaaao aaaagaaggg aaagggggaa ggggacoctg gcacagattt ggctcgacct 1800 
ggacataggc tgggcctgca agtccgcggg gaccgggtcc agaggggcag tgctgggaac 1860 
gcccctctcg gaaattaact cctcagggca cccgctcccc tcccatgcgc cgccccacto 1920 
ccgccggaga ctaggtcccg cgggggccac cgctgtccac cgcctccggc ggccgctggc 1980. 
cttgggtccc cgctgctggt tctcctccct cctcctcgca ttctcctcct cctctgotcc 2040 
tcccgatccc tcctccgccg cctggtccct cctcctcccg ccctgcctcc ccgcgcctcg 2100 
gcccgcgcga gctagacgtc cgggcagccc ccggcgcagc gcggccgcag cagcctccgc 2160 
cccccgcacg gtgtgagcgc ccgacgcggc cgaggcggcc ggagtcccga gctagccccg 2220 
gcggccgccg ccgcccagac cggacgacag gccacctcgt cggcgtccgc ccgagtcccc 2280 
gcctcgccgc caacgccaca accaccgcgc acggccccct gactccgtcc agtattgatc 2340 
gggagagccg gagcgagctc ttcggggagc agcgatgcga ccctccggga cggccggggc 2400 
agcgctcctg gcgctgctgg ctgcgctctg cccggcgagt cgggctotgg aggaaaagaa 2460 
aggtaagggc gtgtctcgcc ggctcccgcg ccgcccccgg atcgcgcccc ggaccccgca 2520 
gcccgcccaa ccgcgcaccg gcgcaccggc tcggcgcccg cgcccccgcc cgtcctttcc 2580 
tgtttccttg agatcagctg cgccgccgac cgggaccgcg ggaggaaegg gaegtttcgt 2640 
tcttcggccg ggagagtctg gggcgggcgg aggaggagac gcgtgggaca. ccgggctgca 2700 
ggccaggcgg ggaacggccg ccgggacctc cggcgccccg aaccgctccc aactttcttc 2 7 60 
ectcactttc cccgcccagc tgcgcaggat cggcgtcagt gggcgaaagc cgggtgctgg 2820 
tgggcgcctg gggccggggt cccgcacgtg c,gccccgcgc tgtcttccca gggcgcgacg 2 8 80 
gggtcctggc gcgcacccga ggggcgggcg "ctgcccaccc gccgagactg cactgtttag 2940 
ggaagctgag gaaggaaccc aaaaatacag cctcccctcg gaccccgcgg gacaggcggc 3000 
tttctgagag gacctccccg cctccgccct ccgcgcaggt ctcaaactga agccggcgcc 3 060 
cgccagcGt'g gccccggccc ctctccaggt ccccgcgatc ctcgttcccc agtgtggagt 3120 
cgcagcctcg acctgggagc tgggagaact cgtctaccac cacctgcggc tcccggggag 31B0 
gggtggtgct ggcggcggtt agtttcctcg ttggcaaaag gcaggtgggg tccgacccgc 3240 
cccttgggcg aagaccccgg ccgctcgcct cgcccggtgc gccctcgtct tgcctatcca 33 00 
agagtgccoc ccacctcccg gggaccccag ctccctcctg ggcgcccgcg ccgaaagccc 33 60 
caggctctcc ttcgatggcc gcctcgcgga gacgtccggg tctgctccac ctgcagccct 3420 
tcggtcgcgc ctgggcttcg cggtggagcg ggacgcggct gtccggccac tgcagggggg 3480 
gatcgcggga ctcttgagcg gaagccccgg aagcagagot catcctggcc aacaccatgg 3540 
tgtttcaaaa tggggctcac agcaaacttc tcctcaaaac ccggagactt tctttcttgg 3600 
atgtctcttt ttgctgtttg aagaatttga gccaaccaaa atattaaacc tgtcttacac 3660 
acacacacac acacacacac acacacacac cggattgctg tccctggttc aagtgtgcca 3720 
agtgtgcaga cagaacatga gcgagtctgg cttcgtgact accgaccata aacccacttg 3780 
acaggggaaa catgccttgg aaggtttaat tgcacaattc caaccttgag ctgcgcgggt 3840 
tccaagagcc aggcccgtac ttgctgttga tgtcattggc ttggggagtt ggggtttggt 3900 
gcccagcgcg gtcgttgggg gaggggcaag gcatagaaca gtggttccca gaccttgctg 3960 
cacattggaa ttacctggga ttaaaaaaaa aaaaatcaaa acaaaaacca gtgtctggct 4020 
cccgcGccca gacattctga .tttaattggc atggggcaag acctggactt gggatttttt 4080' 
ttaatgctct tcatgtgatc tgttgggcag ccagatttgg ggatcactag acggaagaag 4140 
gattgttaaa gtctccggag atgttacttg ccaatgctaa gagctctttg aggacatctg 4200 
gaattgttac aatattgcca aatataggaa agagggaaaa ggtagagtgt gattccaata 4260 
ataaaggatt ccgcttttca ttgaaggaac tggtggaaag gtttcttctc tgctgagcct 4320 
gcaggcccgt cctgcctgcc tggggtgccc gggagacgcg ggcctgctcc ggagactgct 4330 
gactgccggt cctgttagtc aggtgtcagc cctgtctctg ccgaagagac tcttctcttt 4440 
atttfcaaat.t aaacccfccag agcaccacca aagcatcact tttctccctc cattggtgtt 4500 
ctcattcttt gatgttactt gtttgaacac cactattagt agttggagat ttgttcctga 4560 
gaaaaatata aataccactt aatttgcctg tttgtcccgc attcactcaa aacagaatgc 4 620 
tcctgaagac aagagagaga gfcaggagaac agacgctatt ccattacagt aacataaaag 4680 
actggatttt caggggcaaa ttattaaaat aggagatgag ctcttttaac agaaatttgt 4740 
ttaaggcctg tgtctatcaa attcagtgga ttttattcaa gatgcacttt gtttagtggg 4800 
agttttgttt ggttctggga catgctaact tctagacttg ctgctcttag aggtaatgac 4860 
tgocagacac catttcatga gtcctaatcc ccacattaag cataagaggt gcacactctc 4920 



-2- 
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ctcctatggg ggaaactgag gtacgaagaa ctaaagtgac tttcccacag ctggtgggag 4 930 
gcagacggga .4990 



<210> -3 
<211> 20 
<212> mA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; Synthetic 
Primer 

<400> 3 

gttccactgt tgtgottocc 20 

<210> 4 
<21X> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequences Synthetic 
Primer 

<400> 4 

aagaaagttg ggagcggttc 2 0 



<210> 5 ■ ' ' 

<211> 19 * . . 

<212> DBA 

<213> Artificial Sequence ! 
<220> 

<223> Description of Artificial Sequence: Synthetic 

Primer 

<400> 5 ■ ' ■ 

gggtggactt gccaaagga 19 



<:210> 6 
<21X> 20 
<212> UNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<;400> 6 

cttagagcca gcgtcggata 



20 



<210> 7 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; Synthetic 
-3- . - 
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Primer 
<400> 7 

gcatgacttc aacgcaaagt 



<210> 3 
<211> 20 
<212> DNA 

<213> Artificial Sec[uence 
<220> 

<223> Description of Artificial Sequence: Synthetic 

Primer 

<400> 8 

gaggctaagt gtcccactgc 



<:210> 9 
<211> 20 
<212> DNA' 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 9 

tcggacttta gagcaccacc 



<210> 10 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

. <223> Description of Artificial Sequence: Synthetic 
Primer- 

<400> 10 

gaggaggaga atgcgaggag 



<210> 11 
<211> 21 
.<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthefcic 

Primer 

<400> 11 

aaattaactc ctcagggcac c 



<210> 12 
<211> 20 
<212> DNA 

<213> Artificial Secfuence 
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<220> 

<223> Description of Artificial Sequence! SyntlietiG 



Primer 



<400> 12 

cgcccttacc tttcttttcc 



20 



<210> 13 
<211> 20 
<212> tmA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 13 

ccctgactcc gtccagtatt 20 



<210> 14 
<2X1> 20 
<212> DKA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 



<210> 15 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 15 

accagctgtg ggaaagtcac 20 



<210> 16 
<2.11> 20 
<212> DNA 

<213> Artificial Sequence 
<:220> 

<223> Description of Artificial Sequence: Synthetic 

Primer 



<400> 14 

cgtccttfccG tgtttccttg 



20 



<400> 16 

agacgagttc tcccagctcc 



20 



<210> 17 
<211? 20 
<212> DMA 
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<;213> Artificial Sequence 
<22 0> 

<223> Description of Artificial Sequence! Synthetic 
Primer 

<400> 17 

gcgcaggtct caaactgeiag 20 



<210> 18 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<22 0> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 18 

ggagaagttt gctgtgagcc 



<210> 19 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 19 

ccctcgtott gcctatcca 



<210> 20 • 
<21X> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 



<400> 20 

agtgatcccc aaatctgget 



<210> 21 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 21 

ggcatagaac agtggttccc 



<210> 22 
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<211> 20 
<212> DHA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Syntlaetic 
Primer 

<400> 22 

gaacaccaat ggagggagaa 



<210> 23 
<211> 20 
<212> DNA 

<213> Artificial Sequence " 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 23 

tgaaggaact ggtggaaagg 



<210> 24 
<211> 20 
<:212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
' Primer 

<400> 24 

oatgtcccag aaccaaacaa 



<2a0> 25 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 25 

ocaccggtac cggcggccgc tggccttg 



<210> 2S 
<211> 23 
<212> DWA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 

Primer 

<400> 26 

cggcgagaca cgcccttacc ttt 
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<210> 27 
<211> 25 ■ 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> DeBcription of Artificial Sequence; Synthetic 
Primer 

<400> 27 

gcagcctccg ccccccgcac ggtgt 



<210> 28 
<211> 25 
<212> DKTA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequences Synthetic 
Primer 

<400> 28 

acaccgtgcg gggggcggag gctgc 



<210> .29 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 29 

gcagcctccg ccccccgcac ggtgt 

<210> 30 
<211> 25 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial .'Sec[ueince: Synthetic 
Primer 

<400> 30 

acaccgtgcg gggggcggag gctgc 



<210> 31 

<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Seqpience: Synthetic 
Primer 



<400> 31 
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gcagcctcct ccccccgcac ggtgt 



25 



<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Seguan.ce: Synthetic 
Primer 

<400> 32 

acaccgtgcg gggggaggag gctgc 25 



<210> 33 
<211> 25 
<212> Dm 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 



<210> 34 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer . 

<400> 34 

acaccgtgcg gggggaggag gctgc 25 



<210> 35 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 



<400> 33 

gcagcctcct ccccccgcac ggtgt 



25 



<400> 3S 

.^ttcgatcgg ggcggggcga gc 



22 



<210> 36 
<211> 22 . 
<212> DNA 



<213> Artificial Sequence 



<2ao> 

<223> Description of Artificial Sequence: Synthetic 
Primer 
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<400> 36 

gctcgccccg ccccgatcga at 22 



<2X0> 37 
<211> 22 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Descriptioa of Artificial Sequence: Synthetic 
Primer 

<400> 37 

attcgatcgg ggoggggcga gc 22 



<210> 38 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

. <223> Description of Artificial Sequence: Synthetic 
Primer 



<400> 38 

gctcgccccg ccccgatcga at 
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PCTAJS2005y006559 



Box No. I Basis of the opinion 

1. With regard to the language, this opinion has been established on the basis of the international application in 
the language in whicfi it was filed, unless ottienwise indicated under this item. 

□ This opinion lias been established on the basis of a translation from the original language into the following 
language , whicii is the language of a translation furnished for ttie purposes of international searcii 
(under Rules 12.3 and 23.1 (b)). 

2. With regard to any nucleotide ancU>r amino acid sequence disctosed in the international application and 
necessary to the claimed invention, this opinion has been established on the basis of; 

a. type of material: 

^ a sequence listing 

□ table(s) related to the sequence listing 

b. format of material: 

M in written format 

12 In computer readable fonn 

c. time of flllng/furnishing: 

S contained in the international application as filed. 

□ Hied together with the international apii^ication I n computer readable foim . 
IS furnished subsequently to this Authorrly for the purposes of search. 

3. El In addition, in ttie case that more flian one version or copy of a secpjence listing and/or table relating thereto 

has been filed or furnished, the required statements that the information in the subsequent or additional 
copies Is Identical to that in the application as filed or does not go beyond the application as filed, as 
appropriate, were fumished. 

4. Additional comments: 



BoxNo.n Priority 



1 . B Tlie validity of the priority claim has not been considered because the I nternatlonal Searching Authority 

does not have in its possession a copy of the earlier application whose priority has been claimed or, where 
required, a translation of that earlier application. This opinion has nevertheless been established on the 
assumption that the relevant date (Rules 43Ws.1 and 64.1) is the claimed priority date. 

2. □ This opinion has been established as If no priority had been claimed due to the fact that the priority claim 

has been found invalid (Rules 43Ws.1 and 64.1). Thus for the puiposes of this opinion, the international 
filing date indicated above is considered to be the relevant date. 

3. Additiona! observations, if necessary: 

see separate sheet 
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Box No. Ill Non-estabUshmenit of opinion with regard to novelty, inventive step and industrial 
aw>llcabtlity ■ ^ 

The questions whetlierth© daimed invention appears to be novel, to Involve an Inventive step (to be non 
obvious), or to be induslrlaity appilcabis have not been examined in respect of: 

□ tlie entire internationai application, 
ISI ciainrts Nos. 18 

because: 

M the said international application, or the said claims Nos. 1 8 relate to the following subject matter which 
does not require an international preliminary examination (spedty): 

see separate sheet 

□ the description, claims or drawings (indicate particular elements below}- or said claims Nos . are so 
unclear that no meaningful opinion could be formed (specify): 

□ the claims, or sard claims Nos, are so Inadequately supported by the description that no meaningful opinion 
could be fonned. 

□ no international search report has been established for the whole aii^lication or for said claims Nos. 

□ the nucleotide and/br amino add sequence listing does not comply with the standard provided for in Annex 
C of the Administrative instoictions in tiiat: 

the written form □ has not been furnished 

□ does not comply with the standard 
the computer readable form □ has not been furnished 

□ does not comply with the standard 

□ the tables related to the nucleotide andbr amino acid sequence listing, if in computer readable fonn only, do 
not comply with the technical requirements provided for in Annex C-bls of the Administrative Instoictions. 

□ See sqsarate sheet for llirtherdet^ls 
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Box No. V Reasoned statement under Rule 43d/s.1(a)(i) with regard to novelty, inventive step or 
industrlai appllc^ility; citations and explanations supporting such statement 

1. Statement 



Novelty (N) 


Yes: 


Claims 


5-1 0,23-24,31 -32,34.37-38 




No: 


Claims 


1 -4,1 1 -22,25-30,33,35-36.39,40 


Inventive step (IS) 


Yes: 


Claims 


9-10,23-24,31-32,34 • 




No: 


Claims 


1-8.11-22,25-30,33,35-40 


Industrial applicability (lA) 


Yes: 


Claims 


1-17,19-40 




No: 


Claims 





2. Citations and explanations 
see separate sheet 



Box No. VII Certain defects in the international application 

Thefoltawing defects in the form or contents of the international application have been noted: 
see separate sheet 
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Reference is made to the following documents: 



International application No. 
PCT/US2005/006559 



D1 : WO 2004/01 1 625 A (UNIVERSITY OF SOUTHERN CALIFORNIA; LEN2, 
HEINZ-JOSEF; STOEHLMACHER, JA) 5 February 2004 (2004-02-05) 

D2: DESAI APURVA A ET AL: "PharmacogenomiGs: Road to anticancer 

therapeutics nirvana?" ONCOGENE, vol. 22, no, 42, 29 September 2003 (2003- 
09-29), pages 6621 -6628, XP002340900 ISSN : 0950-9232 

D3: BUERGER HORST ET AL: "Length and loss of heterozygosity of an intron 1 
polymorphic sequence of egfr is related to cytogenetic alterations and epithelial 
growth factor receptor expression" CANCER RESEARCH, vol. 60, no. 4, 15 
Febmary 2000 (2000-02-15), pages 854-857, XP002340901 ISSN: 0008-5472 

D4: KAGEYAMA R ET AL: "EPIDERMAL GROWTH FACTOR EGF RECEPTOR 
GENE TRANSCRIPTION REQUIREMENT FOR SP1 AND AN EGF 
RECEPTOR-SPECIFIC FACTOR" JOURNAL OF BIOLOGICAL CHEMISTRY, 
vol. 263, no. 13, 1988, pages 6329-6336, XP002340902 ISSN: 0021-9258 

D5: JOHNSON A C ET AL: "EPIDERMAL GROWTH FACTOR RECEPTOR GENE 
PROMOTER DELETION ANALYSIS AND (DENTIFICATION OF NUCLEAR 
PROTEIN BINDING SITES" JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 
263, no. 1 2, 1 988, pages 5693-5699, XP002340903 ISSN : 0021 -9258 

D6: DATABASE NCBI [Online] SNP in EGFR promoter 30 March 2000 (2000-03- 
30), XP002340905 retrieved from NCBI Database accession no. rs712829 

D7: DATABASE NCBI [Online] SNP in EGFR promoter 28 April 2000 (2000-04-28), 
XP002340906 retrieved from NCBI Database accession no, rs712830 

D8: LIU WANQING ET AL: "A functional common polymorphism in a Sp1 
recognition site of the epidermal growth factor receptor gene promoter" 
CANCER RESEARCH, vol. 65, no. 1 , 1 January 2005 (2005-01 -01 ), pages 46- 
53, XP002340904 ISSN: 0008-5472 

Introduction 

The application discloses 12 single nucleotide polymorphisms (SNPs) located in the 
regulatory region of the human Epidermal Growth Factor Receptor (EGFR) Gene and uses 
thereof. 

SEQ ID N0;1 , 2 are part of the genomic DNA of the locus EGFR and SEQ ID No.3-38 are 
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corresponding primers. The application draws priority from one priority document which 
has the priority date 1 March 2004 (P1). 

Re ftem II 

Priority 

1 . Not all the priority documents were available at the time of the establishment of the 
present opinion which consequently has been established assuming that all the 
claims are entitled to the earliest claimed priority. Should, however, the priority be 
invalid, the Applicant is informed that documents D8 would be detrimental to the 
novelty and/or inventivity of the clamed subject-matter. 

Re Item IH 

Non-establishment of opinion with regard to novelty, inventive step and industrial 
applicability 

2. Claim 1 8 relates to subject-matter considered by this Authority to be covered by the 
provisions of Rule 67.1(iv) PCT. Consequently, no opinion will be formulated with 
respect to the industrial applicability of the subject-matter of these claims (Article 
34(4)(a)(i) PCT). 

Re Item V 

Reasoned statement Article 35(2) PCT with regard to novelty, Inventive step or industrial 
applicability; citations and explanations supporting such statement 

Novelty and inventive step (Art. 33(1 )-(3) PCT) 

3. The subject-matter of claims 1 , 4, 11 -20, 25-28, 33, 35-36, 39, 40 is not new in view 
of D1 . D1 discloses a method for evaluating the potential efficacy of an EGFR- 
targeting therapeutic agent for the treatment of cancer or predicting the clinical 
prognosis for a cancer patient using determination of the sequence of a 
polymorphism in EGFR and kit comprising a nucleic acid for determining the 
sequence of a polymorphism in an EGFR gene locus (page 2, pages 43-44, pages 
45-46). 
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Claims 2-3, 21 -22, 29-30 is not new in view of D1 . With the wording "wherein the 
polymorphism Is In linkage disequilibrium with", the subject-matter encompasses any 
position of the EGFR regulatory region and the polimoiphism disclosed in D1 falls 
into the scope of said claims. 

4. The subject-matter of claims 5-8 the appears to be trivial for the skilled person armed 
vnth the disclosure of D1 and D2. D2 reviews EGFR pharmacogenomics and in 
particular EGFR inhibitors such as geftinib or eriobtinib (pages 6625-6626). 

5. The subject matter of claims 9-1 0, 23-24, 31 -32, 34 is new and inventive. The 
difference of the application with D1 is the nature of the EGFR polymorphism. The 
effect derived from this difference is the evaluation of the potential efficacy of an 
EGFR-targeting therapeutic agent for the treatment of cancer. Therefore, the problem 
can be defined as the provision of further EGFR polymorphisms for evaluating the 
potential efficacy of an EGFR-targeting therapeutic agent for the treatment of cancer. 
The solution of the application is the provision of a polymorphism at position -216 
with a T which is indicative of a higher expression level of EGFR. No such 
polymorphism is disclosed in the prior art documents D1-D7 in hand. 

6. For the other polymorphisms disclosed in the application as given In claim 37, 
however, said problem is not solved. Therefore, the stalled person Is left with an 
undue burden to achieve the effect wished over the whole scope of said claims. In 
consequence, no inventive step can be acknowledged for the subject-matter covering 
other EGFR polymoiphism than the polymorphism at position -21 6 with a T. 

Industrial Applicability (Art. 33. (1) and (4) POT) 

7. For the assessment of the present claim 1 8 on the question whether it is industrially 
applicable, no unified criteria exist in the PCT Contracting States. The patentability 
can also be dependent upon the formulation of the claims. The EPO, for example, 
does not recognize as industrially applicable the subject-matter of claims to the use of 
a compound in medical trealjnent, but may allow, however, claims to a known 
compound for first use in medical treatment and the use of such a compound for the 
manufacture of a medicament for a new medical treatment. 
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Re Item VII 

Certain defects in the international application 

8. With the wording "wherein the polymorphism is in linkage disequilibrium with", the 
subject-matter encompasses any position of the EGFR regulatory region and lacks 
therefore clarity and support in the sense of Article 6 PCI. 

9. The subject-matter of claims 1 -3, 1 9-22, 28-30,- 33, 35-40 and dependent claims does 
not meet the requirement of Art. 6 PCT (support) since only a very limited number of 
compounds are disclosed in the application, namely, EGFR carrying polymorphism at 
position -216 or -216 in combination with -191 which has the effect disclosed and no 
support for the whole scope of the claims can be found in the application. 
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